Posts Tagged ‘ Technical ’

Bouncing to audio

‘Bouncing’ to audio is a process of rendering realtime generated audio to audio files. Typically, ‘realtime generated audio’ is software synthesisers, samplers, hardware sound generators, or even audio files being processed by plugins or hardware effects processors. After bouncing, these audio sources are turned into audio files on your hard drive. The audio files are a snapshot of how those sources sound – the same way a tape recording is a snapshot of a performance.

There are a number of different terms for this. Often you’ll see it referred to as ‘rendering’ or ‘exporting’, or even ‘loopback recording’. The term ‘bouncing’ harks back to multitrack tape recording systems, when the process involved re-recording audio from some tape tracks onto one or more other tracks. The audio was ‘bounced’ from track to track on a tape system.

Doing this can be a good idea for a number of reasons.

  • It can help conserve resources. In a DAW environment, it can allow you to conserve CPU (by rendering a track that uses CPU-hungry plugins, then deactivating those plugins). In a hardware environment, it can allow you to use a specific piece of equipment – either an instrument or an effects processor – on many tracks at once.
  • It can make a project more portable. By rendering tracks, you can bring the project files to another studio – even if that studio doesn’t have the same plugins or hardware that you do. It can even allow a project to be shared between different DAW platforms or studios based on harware/software (and mixtures of both).
  • It can help you make decisions. Rending tracks locks you in to a particular sound and performance. While realtime generated audio allows you to continually adjust the track (and for MIDI – the performance), rendering those tracks to audio files creates a snapshot that cannot be changed much (or without difficulty). This can be made part of a project workflow to mark the end of one stage and the beginning of the next stage.

Obviously, there are a couple of downsides. One is the track space. In a DAW environment, rendered audio files take up additional hard drive space. This is usually not an issue, because hard drives are cheap and high-capacity. It’s more of an issue with hardware recording systems, because some have some very strict restrictions on how many simultaneous tracks are available at once.

The other downside is that it prevents further editing of the track – both the effects processing settings and (for MIDI) the performance. This is usually mitigated by keeping a deactivated copy of the original realtime generated track.

Personally, I use track rendering at two points in my workflow:

  1. When the artist brings their demo to my studio. My artists work on a variety of platforms, so I ask them to render each track to bring them into my studio for further work.
  2. When using hardware instruments, hardware effects processors, or CPU-heavy plugins. Obviously, this is to allow these tools to be used many times in a project. It also allows projects to be recalled at later sessions (I use some hardware devices that are very complex and have no presets). I also use a CPU-heavy amp simulator, which I routinely render to audio as it’s being recorded – because I prefer not to have restrictions on how many guitar parts I use (and it’s no different to recording an audio file of a physical amp).

The decisions and if and when you render tracks to audio depends on your project workflow, your studio resources and your preferred style of working. Obviously, there are no generic rules – just what works for you.

-Kim.

Dynamic range and headroom

Noise floor

The noise floor of a system is the level at which the background noise occurs. In analogue systems, this will be the hiss and/or hum. In digital systems, this will be the point at which audio has less than one bit to represent it (audio at this level sounds like a crunchy mess).

Saturation point

The saturation point of a system is the level at which audio becomes noticeably clipped or distorted. In analogue systems, this is the point at which the system is overloaded and starts to behave non-linearly (often it’s when the signal is distorted by 1%). In digital systems, it’s ‘Full Scale’ – any louder and the audio is mercilessly clipped.

Nominal level

This is the measurement reference point. It’s the level that we call 0dB (or sometimes, unity). In digital systems, this is always at the same level as the saturation point. In analogue systems, the nominal level is some distance below the saturation point: 18dB or 24dB for example.

The levels of the noise floor and saturation point are measured relative to the nominal level. For example, an analogue system might have its saturation point 18dB above the nominal level and its noise floor 72dB below ths nominal level. We say the saturation point is at ‘+18dB’ and the noise floor is at ‘-72dB’. A 16 bit digital system has its noise floor 96dB below the saturation point, and because it’s a digital system, the saturation point is also the nominal level: 0dBfs. Because the noise floor is 96dB below the nominal level, we say the noise floor is at ‘-96dB’. The level difference between the noise floor and the nominal level is also called the signal-to-noise ratio.

Dynamic range

The dynamic range of a system is the difference between the noise floor at the saturation point. In the above analogue example, the noise floor is at -72dB and the saturation point is at +18dB. Thus the dynamic range is 90dB. In the above digital example, the noise floor is at -96dB and the saturation point is at 0dB. Thus the dynamic range is 96dB.

The dynamic range of a piece of audio is the difference between the quietest level and the loudest level. If the dynamic range of the audio is greater than the dynamic range of the system, it should be compressed. This will reduce the dynamic range of the audio so that it can be adequately processed by the system.

Headroom

The headroom of a system is the level difference between the nominal level and the saturation point. In the above analogue system, the saturation point is at +18dB, thus it has a headroom of 18dB. In the above digital system, the saturation point is at 0dB, thus is has no headroom – in the traditional sense.

The headroom required by a piece of audio is the difference between the steady-state average level and the maximum peak level. In an analogue system the gain is often set so that the average level is at 0dB. In these cases, the headroom of the system should be greater than the headroom required by the audio – otherwise audible clipping will occur.

The situation is different with digital systems. Because digital systems have no headroom above 0dB, it is common practice to set audio gain as if the nominal level is actually much lower. Unfortunately, there is no standard practice or agreement for what the in-practice nominal should be. Bob Katz’ K-System attempts to, among other things, set three standard nominal levels: -20dB, -14dB and -12dB. Each has a different trade-off- between available headroom and overall volume.

-Kim.

Masking

Masking is a little-understood concept that is important to composers and mix engineers. Essentially, masking is what happens when one sound makes it difficult to hear another sound. An obvious example of this is two instruments playing the same note, with one instrument sounding much louder than the other.

This can happen with notes or chords, where the voicing of one instrument covers up another, softer instrument. It can also happen with frequencies, where an element of one sound covers up an element of another sound. As with the example above, this happens when two instruments are playing the same note or frequency range and one is much louder than the other.

It can also happen when the notes or frequencies are not exactly the same, but nearby. The effect is particularly strong when both instruments are playing the same or similar parts, and the sounds blend very well. A common example is of distorted guitars and distorted bass. On its own, the distorted bass might have a heavy growl caused by a lot of energy in the lower mids and a crunchy fuzz on top. Once the guitars are brought in, however, the bass is reduced to a low-frequency rumble beneath the guitars. Even though the main energy of the guitars might be in the upper mids, it masks the upper harmonics in the distorted bass.

Another example is vocal harmonies. A song might have a section where the main melody is sung in parallel harmony – perhaps a third or fourth apart. If both voices are similar (sung by the same singer, in the same style, with similar processing), our ear will hear the upper harmony as being much more prominent than the lower harmony. The effect is sometimes quite striking – the lower harmony simply blends into the upper harmony.

These are both cases of the higher sound masking the lower sound.

Sometimes masking is useful, as it allows a sound to be thickened or deepened by adding other sounds to it. Other times it is undesirable as it makes it difficult for the listener to distinguish between the different sounds.

In the bass/guitar example, greater separation could be achieved by filtering or EQ so that each instrument contributes a unique sonic component to the mix. Alternatively, each instrument could be given a different depth. For example, the bass could be up front and the guitar further back in the mix.

In the vocal example, greater separation could be achieved by instructing the singer to perform each part differently – such as whispering one part, or perhaps singing one part forcefully. Better yet, have a different singer perform one of the parts.

-Kim.

Normalising

Normalisation is a process that changes the volume of a piece of audio. It does this by first analysing the audio, looking for the highest peak. Then an amount of gain is applied to the entire section of audio, so that the highest peak is at 0dBfs. Because of the need to analyse the audio before applying gain, normalisation is an offline process – meaning it can’t be applied in realtime (as a plugin, for example). Also, because static gain is applied, the dynamics of the audio do not change. It’s exactly the same as adjusting the fader on an audio channel, except that there is a pre-calculation to determine how much to adjust it.

There are two problems with normalising:

  1. You don’t know or control how much gain is being applied. That’s because the amount of gain is determined by analysing the audio.
  2. The amount of gain being applied has nothing to do with how loud the audio is (as we perceieve it). That’s because the amount of gain is calculated from the peak level of the audio – not the RMS or average level (see here for more details about peak vs RMS).

Normalising audio ONLY makes sense if:

  1. Your audio started higher than 16 bits; AND
  2. You’re about to quantise to 16 bits (or similar) directly after normalisation; AND
  3. You don’t care what the average (RMS) level of the audio is after quantisation.

In other words, this is a process that makes sense where there are a series of offline gain stages, and somewhere in the MIDDLE the audio is being quantised to 16 bits (but subsequent processing is at a higher bit depth). The uncontrolled amount of gain is not a problem if later gain stages will also be applied.

In these situations, normalising is a useful way to maximise the dynamic range of a low-resolution digital system. This is because the audio is made as loud as possible before quantising so that the higher noise floor (caused by the low resolution) is as low as possible relative to the audio. An even lower relative noise floor is possible by using dynamic processing (such as compression or limiting), but normalisation is the best solution that doesn’t affect the original dynamics of the audio.

On the other hand, if your task doesn’t meet all three criteria, then there are more appropriate processes than normalisation.

-Kim.

Gain Staging

A “gain stage” is any point in the signal path where gain is applied – where volume can be changed. Gain can be positive (makes the sound louder), negative (makes the sound quieter), or unity (doesn’t change the volume – but it’s still a gain stage!). “Gain staging” is the awareness that there are all these gain stages, and it’s important to carefully adjust each one so that each processing stage is operating optimally. This means balancing headroom and noise floor to keep the audio as clean as possible. 

The noise floor of an audio system is the level at which the background noise (hiss, etc) is. This is not the hiss in the recording, but the background noise inherent to the system itself. Generally, it’s best to stay as far away from the noise floor as practical. In analogue systems, the noise floor is hiss or hum caused by the electrical components. In digital systems, the noise floor is crunchy quantisation noise caused by a lack of digital resolution.  In modern digital systems, noise floor is not a big concern. Most professional analogue-to-digital converters (ADCs) have a noise floor below 100dBfs.

The headroom of an audio system is the amount of room (in decibels) between the ‘nominal’ level and the saturation level. The nominal level is level at which the audio spends most of its time. There is some flexibility in deciding what the nominal level should be. A low nominal level will give you lots of headroom, but a higher noise floor. A high nominal level will give you less headroom but a lower noise floor. The less headroom you have, the more saturation/clipping you’ll get, and the more compression and limiting you’ll need to keep the sound clean.

Noise floor is less of an issue in professional digital systems (especially all-software systems such as DAWs), but headroom is still critically important – even more so in today’s loudness war. If you don’t give yourself enough headroom early in the signal path, you’ll find yourself hampered by your need to reduce dynamics for technical reasons instead of focussing on sound.

-Kim.

Sample Rate

As with bit depths, there are several different samplerates used for digital audio. While bit depth determines the accuracy of low level details, samplerate determines the accuracy of high frequency details.

Samplerate is actually the rate at which the digital audio is being processed, and where there are multiple files or audio streams being processed simultaneous (such as in a DAW or digial mixer) all the audio streams need to be at the same samplerate. For this reason, it’s best to choose a samplerate before recording begins, and avoid changing mid-project.

44.1 kHz

This is by far the most common samplerate, as it is the samplerate used by audio CDs.  Roughly speaking, the highest frequency that can be represented is exactly half the samplerate – so at 44.1kHz, the highest possible frequency is 22.05kHz. Seeing as most people cannot hear above 20kHz, this would seem like a good samplerate choice. The problem with this, however, is that the accuracy of those high freqeuencies is quite poor. The closer you get to the highest possible frequency, the worse the accuracy gets. As a result, the inaccuracies can sometimes be heard well below the highest frequency. For this reason, many plugins oversample their critical processing components – meaning the audio is internally converted to a higher samplerate so the highest frequencies can be processed with better accuracy.

48 kHz

This samplerate effectively has the same limitations as 44.1 kHz, except that it’s more commonly used for film and other visual media. This is because it syncronises better with visual frame rates (44.1 kHz doesn’t divide evenly into 24 frames).

96 kHz, 192kHz

Most modern professional digital audio systems can operate at higher frequencies than 44.1 kHz or 48 kHz. This allows audio to be captured and processed with much higher accuracy, especially at the highest frequencies. This usually results in a more open, natual sound. The trade-off is that much more processing power is required. Working at 96 kHz will half your capability compared to working at 48 kHz. That includes disk space (recording time), disk throughput (number of simultaneous tracks) and CPU/DSP power (number of compressors/EQs/effects). Working at 192 kHz cuts your capabilities in half again. Whether this trade-off is worthwhile for you depends on the kind of work you’re doing and your style of working. If you want to record the clearest, purest sound with a minimum of processing, high samplerates might be appropriate. On the other hand, if you want to do a lot of processing (especially if you regularly push your equipment to the limit) then you might prefer the higher capabilities of working at a regular samplerate.

Recording at these high samplerates also has an advantage for sound design and special effects. Because there is so much more high frequency detail captured, slowing down a high samplerate recording results in a much clearer sound than slowing down a sound recorded at 44.1 kHz or 48 kHz.

If you work at 96 kHz or 192 kHz, you might need to convert back down to 44.1 kHz or 48 kHz when preparing audio for distribution.

-Kim.

Bit Depth

Occasionally there’s a bit of confusion about bit depth, and about what the best bit depth to use in different situations is. In the digital world, there are three bit depths that we might have to deal with – 16 bit, 24 bit and 32 bit.

16 bit
Digital audio at 16 bit is most commonly found in CDs. 16 bit audio allows the audio to have a dynamic range of roughly 96dB – that’s the difference between the loudest possible sound and the quietest possible sound. This is fine for a final delivery medium – the vast majority of music has a dynamic range well within this limit.
On the other hand, 16 bit is not so good as a recording format. Often when recording audio, the nominal level has to be quite low so that accidental peaks don’t distort (the distance between the nominal level and 0dBFS in a digital system is the amount of headroom). Because of this, low level signals (such as the decay of notes, or subtle details in the sound) may be recorded at a very low level below 0dBFS. When recording at 16 bit, any audio below 48dB (such as the decay of notes or subtle detail in the sound) is actually captured with less than 8 bits. This can give those low level signals a crunchy or distorted sound. This may be exacerbated in the mix by further processing such as compression and EQ. 
24 bit
Professional analog-to-digital converters can capture low level details at higher resolution. This means that the low level
32 bit

Bit depth determines the accuracy of low-level details in the audio. This includes the subtle details in the sound and the decay of notes or reverb tails. 

16 bit

Digital audio at 16 bit is most commonly found in CDs. 16 bit audio allows the audio to have a dynamic range of roughly 96dB – that’s the difference between the loudest possible sound and the quietest possible sound. This is fine for a final delivery medium – the vast majority of music has a dynamic range well within this limit.

On the other hand, 16 bit is not so good as a recording format. Often when recording audio, the nominal level has to be quite low so that accidental peaks don’t distort (the distance between the nominal level and 0dBfs in a digital system is the amount of headroom). Because of this, low level signals (such as the decay of notes, or subtle details in the sound) may be recorded at a very low level below 0dBfs. When recording at 16 bit, any audio below 48dB (such as the decay of notes or subtle detail in the sound) is actually captured with less than 8 bits. This can give those low level signals a crunchy or distorted sound. This may be exacerbated in the mix by further processing such as compression and EQ.

24 bit

Professional analog-to-digital converters can capture low level details at higher resolution. This means that the low level signals can be captured accurately without having to record with less headroom. Recording at 24 bit allows the finest details to be saved. 24 bit recording provides a theoretical dynamic range of 144dB (compared to 96dB at 16 bit), but no analog-to-digital converter records with this much range (figures of around 110dB are typical). However, capturing at 24 bit is appropriate because computers are more efficient at handling data in 8 bit “chunks”.

The problem with 24 bit audio is that it can be limiting when mixing. Mixing often involves summing a large number of tracks, each with several stages of processing. In this scenario, small errors can accumulate.

32 bit

Many software mixers convert audio to 32 bit internally during processing. This goes some way to reduce the effect of low level errors accumulating, and also has the added bonus of being able to have audio that excedes 0dBfs without clipping  – as long as it happens internally to the software (that is, before it leaves the mix bus).

It is also sometimes worthwhile rendering audio at 32 bit. This would be a good idea if you intend to further process the audio. An example of this is if you render a mix to a stereo file, intending to import the stereo file into a mastering project. This means no resolution is lost between mixing and mastering. 32 bit audio is not suitable for recording or distribution.

 

It might be a good idea to start with an approach of recording at 24 bit, mixing at 32 bit and mastering to 16 bit for distribution.

-Kim.