A quick-and-dirty audio sample mixing technique to avoid clipping-CFANZ编程社区

http://atastypixel.com/blog/how-to-mix-audio-samples-properly-on-ios/

两个音频数字信号的混合

drummer 1 + drummer 2 + … + drummer 500

oh-god-please-make-them-stop

A quick-and-dirty audio sample mixing technique to avoid clipping_ide

So, digital mixing actually requires a little thought in order to avoid overflowing these bounds and clipping. I recently came across this when writing some mixing routines for my upcoming app Loopy 2, and found a very useful discussion on mixing digital audio by software developer and author Viktor Toth.

Note that a simple average of the samples (as in, (sample 1 + sample 2) / 2) won’t accomplish this – for example, if sample 1 is silent, whilesample 2 is happily jamming away, sample 2

Instead, we want to meet three goals – assuming signed audio samples, the standard format for Remote IO/audio units on the iPhone/iPad, which can range from negative, through to zero (silence), up to positive values.

If both samples are positive, we mix them so that the output value is somewhere between the maximum value of the two samples, and the maximum possible value
minimum
If one sample is positive, and one is negative, we want them to cancel out somewhat

If we’re talking about signed samples, MIN…0…MAX, this does the trick:

A quick-and-dirty audio sample mixing technique to avoid clipping_sed_02

This lets the volume level for both samples remain the same, while fitting within the available range.

A quick-and-dirty audio sample mixing technique to avoid clipping_ios_03

Here’s how it’s done on iOS:

SInt16 *bufferA, SInt16 *bufferB;
NSInteger bufferLength;
SInt16 *outputBuffer;
 
for ( NSInteger i=0; i<bufferLength; i++ ) {
  if ( bufferA[i] < 0 && bufferB[i] < 0 ) {
    // If both samples are negative, mixed signal must have an amplitude between 
    // the lesser of A and B, and the minimum permissible negative amplitude
    outputBuffer[i] = (bufferA[i] + bufferB[i]) - ((bufferA[i] * bufferB[i])/INT16_MIN);
  } else if ( bufferA[i] > 0 && bufferB[i] > 0 ) {
    // If both samples are positive, mixed signal must have an amplitude between the greater of
    // A and B, and the maximum permissible positive amplitude
    outputBuffer[i] = (bufferA[i] + bufferB[i]) - ((bufferA[i] * bufferB[i])/INT16_MAX);
  } else {
    // If samples are on opposite sides of the 0-crossing, mixed signal should reflect 
    // that samples cancel each other out somewhat
    outputBuffer[i] = bufferA[i] + bufferB[i];
  }
}

Update: A reader recently demonstrated that this technique can introduce some unpleasant distortion with certain kinds of input — as the algorithm is nonlinear, some distortion is inevitable (see the sharp points on the waveform where the condition switches over). For the kind of audio I’m mixing, the results seem to be perfectly adequate, but this may not be generally true.

Update 2: Here’s an inline function I put together for neatness:

inline SInt16 TPMixSamples(SInt16 a, SInt16 b) {
    return  
            // If both samples are negative, mixed signal must have an amplitude between the lesser of A and B, and the minimum permissible negative amplitude
            a < 0 && b < 0 ?
                ((int)a + (int)b) - (((int)a * (int)b)/INT16_MIN) :
 
            // If both samples are positive, mixed signal must have an amplitude between the greater of A and B, and the maximum permissible positive amplitude
            ( a > 0 && b > 0 ?
                ((int)a + (int)b) - (((int)a * (int)b)/INT16_MAX)
 
            // If samples are on opposite sides of the 0-crossing, mixed signal should reflect that samples cancel each other out somewhat
            :
                a + b);
}

but someone say this is wrong

Sigh

| Permalink

This is so terribly wrong. Please don’t mislead newbies into thinking that this is the correct way to mix two channels. The correct way is to simply sum/average them together, as you dismissed early in the article.

Summing/averaging is exactly what every professional analog or digital mixing console does, because it’s exactly what happens in the air and in our ears and in our brains. Yes, it can change the crest factor of the signal, but that’s ok because digital audio is designed to have lots of headroom for the peaks above the normal signal level that you listen at. You’re not generating audio at 0 dBFS are you? Surely you know better than that. :D

If you want to participate in the Loudness War and harshly reduce the dynamic range of your mix til everything is at 11 all the time, use a locally-linear limiter, not this nonlinear distortion stuff.

Audio, Cocoa, iPhone. Bookmark the permalink. Both comments and trackbacks are currently closed.