TransWikia.com

How do software equalizers work?

Signal Processing Asked by Kevin Sullivan on December 4, 2021

Recently, I have been studying IIR and FIR filters, and trying to create a parametric equalizer using a microcontroller. More specifically, I use the ADC on it to sample the audio at about 44 kHz, and in between grabbing samples, I use the time to process the signal and pass it out on the DAC.

What is a bit confusing to me though is that this process seems to be pretty intense, and often takes most of the time in between samples to do all the processing (with 8 2-pole IIR bandpass filters).

Programs like Spotify, or devices like most Galaxy phones have similar equalizers but they are running on non-dedicated hardware, and I am really confused how they are able to pull this off? Does anybody know if they are basically doing the same thing just faster, or is there some secret sauce to this that I am missing?

I have never tested either of the previously mentioned equalizers in those platforms, so maybe they are just really crappy, but at least just by listening to them, it doesn’t sound horrible.

2 Answers

This depends a lot on how you implement it.

  1. A single biquad takes about 10 arithmetic operations. (To be precise a Transposed Form II takes 4-5 multiplies and 3 adds, depending on how the gain management is done).
  2. Arithmetic operation translates into clock cycles of your processor. That depends a lot on the efficiency of your instruction set and how good yo are at writing code. On a good DSP you can do a single biquad in 4 clock cycles (ca. 2 ops per cycle) and a general purpose ARM with a decent ALU this may by 15-20 cycles. On a crappy micro controller with no native hardware multiplier this can take up to 50-100 cycles.
  3. Frame based vs sample based processing. Reading data from an ADC, moving it into the processor, and the moving it to the DAC creates a lot of overhead. That's why most implementations will process a frame of data and not one sample at a time. That "amortizes" the overhead across the entire frame. A good processor will also allow to do this with HW support, i.e. it will use DMA channels to move the data from/to DAC/ADC to internal memory without the CPU having to do anything.
  4. The chips inside modern smart phones (even the cheap ones) are quite powerful: they are typically quad core ARMs with some HW accelerators (Neon, SIMD) for media processing. They all clock in the multiple GHz range and have good HW support for peripherals and graphic accelerators. My trusty old Samsung S8 has a 8 core 2 GHz Snapdragon: https://www.zdnet.com/article/qualcomm-snapdragon-835-what-does-the-kryo-280-adreno-540-spectra-180-x16-and-hexagon-682-mean-for/#:~:text=The%20Snapdragon%20835%20is%20comprised,result%20in%20optimized%20power%20savings.

Let's a run a simple example: A well coded biquad on a decent ARM core should take about 15 cycles. For 8 biquads running at 44.1 kHz on a stereo signal (two channels) that would be just about 10M cycles/second. That's about 1% of a single core at 1 GHz or 0.25% of the entire chip. On my Snapdragon 835 it would be less than 0.03%

Answered by Hilmar on December 4, 2021

I don’t think you’re alone, but essentially this is simply a problem of optimization. Let’s say you have a processor with a 88MHz clock. That’s 2k clocks per sample at 44kHz. If we take the term ‘most’ to mean 50%, of the clocks, then that leaves 1k clocks per sample for filtering. Running 8 filters leaves 125 clocks per filter. That’s a decent amount of time, but data needs to get moved around, math needs to get done. So now we need to figure out how to minimize the the number of clocks per filter.

For one, you can process multiple samples at a time. This will reduce the amount of time spent moving data around the memory to the registers. Things like loading coefficients can be drastically reduced this way, and there are likely gains to made with the input and output streams as well.

For two, you can leverage the capabilities of the processor. If it doesn’t have a floating-point unit, use fixed point math. If it has multiply-accumulate instructions, make sure they’re used. Just basically make sure your assembly output doesn’t have a bunch of crazy stuff going on.

Lastly, you can use alternative filter topologies. Direct/transposed direct forms I or II. Maybe look at different structures like lattice, parallel, state-variable. A lot of people more clever than us have spent a lot of time on this stuff, and there’s some interesting literature on the subject. However, I’d generally recommend starting with the first two ideas. Different topologies can have limitations that are tricky to understand.

Answered by Dan Szabo on December 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP