Audio and MIDI latency problems - What causes it? There's always an element of latency, even down to the time it takes your brain to deliver an instruction to your little finger on the best curlin' motion to extract that crusty bogey.

So what have we got? MIDI/audio latency... 32 bit sampling... buffers... analogue to digital audio converter. What's it all about and why are these components important to the computer-musician?

LATENCY - What is it?

Latency in MIDI is best recognised as the time it takes an instruction from your keyboard hammering to be processed, and sent out to your ears.

Latency times vary between systems and sound cards and is usually in large part down to the midi drivers... so a trip to your sound card manufacturers' website for a check of the latest drivers is always a good start if you're suffering this problem.

Latency varies quite wildly from an un-noticeable 0.8ms (or less) to, I suppose, anything higher. The Yamaha SYXG100 is a quarter beat out on my system!

The thing is, as long as an instruction has to be processed in any way, there's always gonna be a lag. It's just a question of what point does it become physically noticeable to alter our playing behaviour... and we're quite forgiving!

I once saw Mike Lindup (keyboard player, Level 42) in the middle of a song, actually playing the riff to the next chorus, whilst still in the verse! What he was actually doing was programming a riff that would loop once triggered at the chorus so that he could play the counter melody over the top live.

That's incredible! Most folk would just have it down on floppy before the tour even started, and let their roadie worry about it. But it does show how we can compensate, so long as what we're compensating is relatively predictable.

Buffer me bits!

But with digital electronics it's different, as the only way of coping with unpredictability is having a buffer to rely on whilst whatever in the chain is (hopefully) sorting itself out.

Buffer is basically akin to the time it takes you to breath in air, process it, and then expel it again.

The more able your computer processor and internal components are, the less buffer needed. More usually than not, an audio editor will have an option to vary the buffer times - it's simply a matter of trial and error to achieve the minimal buffer size without drops in sound output; which usually means the buffer is too small.

So, buffer is good, IF, there's neither too much, nor too little! And that's the bugger of it if your system isn't able to provide safe passage and throughput of the signals due to processor speed or load.

You see, there could be several internal components requiring its own "private buffer" requirements to run at optimum. The 1st point (apart from your brain!) will typically be at the first point of physical impact with the instrument itself.

If it's a keyboard, then it has to try and make sure it ties the instant you strike the note to the nearest BIT in the CYCLE - so let's look at these variables now.

Bits and Cycles

Loose diagram of how waveform cycles are measured by bits

The digital Bit, by nature, is actually not as faithful to record with as the old fashioned analogue systems, much to many people's surprise.

It's because a BIT is either "on" (1), or "off" (0). There's no in-between. Whereas analogue is continuous because it's a wave cycle.

This is where you start hearing the term 32 BIT and 32x oversampling, commonly seen on domestic stereo systems.

To explain this very crudely and simply, basically, we need to convert the good old fashioned analogue signal (think wavecycle - as seen in oscilloscopes at school) into a digital equivalent through an analogue to digital converter, which sort'a "chops" each signal (wavecycle) 32 times.

Picture a nice smooth curvy wave - as seen in oscilloscopes - and then onto that picture, plot against one curve 32 tiny rectangle "tower" blocks (sorry, best see the diagram above if this doesn't make sense) which rise until each "roof" apex hits the inner wall of the curve.

The point at which the apex touches the curve is basically recorded as an "on" (or a 1 for the techno-nerds).

Next door to your "tower" is your neighbouring tower block which is doing the same thing... and so it goes on 32 times.

Problem is, tower blocks have "flat roofs" so not ALL of that curve will be recorded if the roof is (ridiculously) an inch wide, for instance (or put another way, the signal is sampled at just 16 bits). Ie., there'll be a gap between each "on" point where no sound is computed - this is why digital sound is not as faithful as analogue, because technically, there's always tiny bits missing between each "on" point.

Illustrating the 'co-ordinates' to 'draw' the waveform

So taking this stupid example further, basically the digital converter ends up with something resembling a graph of "co-ordinates" (taken from the point at which the "roof apex" meets the wave), which it sends-on to be processed internally. When that's done, the other side of the chain of processes that pump out the (processed) signal to the listener (the digital to analogue converter (DAC)), has to try and re-draw the analogue curve by joining-up the co-ordinates, so it can push it out to our speakers.

Ok, as I said, that's a very much simplified version so I don't want any emails from electronic boffin's with attached circuit diagrams explaining what goes on... OK?

I think it's worth noting, in case your scratching your head still about the, 'how analogue is more faithful than digital' comment, that this process is carried out roughly 44,000 times a second at typical CD quality music, 22,000 at typical radio quality - hence the need for a decent processor (CPU).

Digital Effects

What's more, once an analogue wave has been digitally "grabbed", it doesn't suffer any further degradation - unless we want it to! And also, because we have the "co-ordinates" logged, so to speak, we can tweak them with microscopic accuracy, homing in on precise frequencies at precise points in time to scrub out (or add) glitches and write algorithms that can toss those "co-ordinates" about to create interesting effects, for instance.

MIDI XG allows many more effects to be applied to the on-board synth-sounds than "normal" GM synths, so therein lies another processor-load adding to our latency problems.

After routing our instruction to play the correct sound it then has to manipulate it according to the effect chosen - the more complex the effect, the more processor load incurred and the better the algorithm (the particular computer program used to mess about with the sound (a reverb algo for example)) used, again, the more load. Conversely, it could be that the algo was sloppily written, ie., not efficient, which means it could be wasting processing power.

Polyphonic Range

Anyway, moving on, you'll often see, or hear of the expression, 'polyphonic range', or similar. This is basically to do with the limits of what the soundcard or synthesiser processor should cope with and is, if you like, part of that initial buffering process. If you strike a 33 note chord for instance - bizarre I know - on a synth with a polyphonic range of 32, then it's gonna have to lose a note, or attempt to stuff it on the next cycle.


Latency Solutions

  • RAM, and loads of it!
  • DRIVERS - Keep up-to-date with your soundcard/synths' manufacturers' latest driver releases.
  • CLOSE PROGRAMS - Minimise CPU and release RAM by closing down all un-needed programs running in the background

More tips: Optimise PC - Audio


So now you understand how extra overhead is being built into the system, 'cos at the other end of the chain, something else has to try and decide if it can go out or not.

In MIDI, you can actually assign "Priority Notes". These will be guaranteed to go out if such a situ occurs, but you do that in the knowledge that you're willing to lose something, like a cymbal for instance. But it does at least give you some control of what's important - and what's not - without too much "damage" to your song.

...Oh and it's very easy to use up all polyphony if you have all 16 channels of a sequencer playing at the same time - keyboards and drums alone can eat up most of that in a busy section. Some lush synth pads can require two or more "elements", as they're known, which is equivalent to having 2 parts or more of your allocated polyphony taken away for the privilege.

In short, it's a frickin' nightmare for the poor sods and that's why you need buffers - too much buffer, and you leave yourself less memory for other things; too little and you risk dropped notes, hanging notes, bad latency and freeze-up.

Of course, we're slowly entering a 64 and 96 and 128bit world, so things should slowly get more "fluid."