Does it make sense to have complex numbers representing real-world audio signals?

Question

In this course (Coursera: Audio Signal Processing for Music Applications) the professor uses an example of obtaining the DFT of a complex sinusoid:

begin{align}
x_2[n]&=e^{jmathleft(2pi f_0 n+varphiright)}quad textrm{for}quad n=0, ldots, N-1\
X_2[k]&=sum_{n=0}^{N-1}x_2[n]e^{-jmath2pi kn/N}\
&=sum_{n=0}^{N-1} e^{jmathleft(2pi f_0 n+varphiright)}e^{-jmath2pi kn/N}\
&=e^{jmathvarphi}sum_{n=0}^{N-1}e^{-jmath 2pileft(k/N-f_0right)n}\
&=e^{jmath varphi}frac{1-e^{jmath 2pileft(k/N-f_0right)N}}{1-e^{jmath 2pileft(k/N-f_0right)}},,
end{align}

where the frequency is expressed as $f_0$ and it has an initial phase.

So it seems like real-world (discrete) audio signals might have complex values when being represented digitally, but does this make sense?
If yes, then how can we interpret this?
Isn’t the value of the input supposed to represent a change in air pressure?
So if for an input signal $x$, say $x[0]$ was $2 + 3 jmath$, what would this tell us about the pressure on the diaphragm at that point of time?

user51597 · Answer

I agree with everyone's math. However, with the $x[0]=2+3j$ example, the actual physical sound wave would only be the real value and you'd just drop the $j$ and the actual sound would have an amplitude of $2$. However, that does NOT mean you can mathematically drop the $j$.

Answered by user51597 on January 4, 2022

Konstantin Schubert · Answer

Your intuition is kind of correct.

The short answer is that while we use complex numbers to represent the fourier series, there is a constraint on the coefficients that makes the complex parts cancel out.

In fact, a real function can be represented to any desired precision as a fourier series with real-valued coefficients. This is a sum over sine and cosine functions - with real-valued coefficients. 
(Without loss of generality I assume the period of the function to be $2pi$)

$f(x) = sum_0^N a_n cos(nx) + b_n sin(nx)$

This fourier series can then be re-written as a series over $e^{inx}$ with complex-valued coefficients by using Eulers formula https://en.wikipedia.org/wiki/Euler%27s_formula

$f(x) = sum_0^N a_n cos(nx) + b_n sin(nx) = sum_{-N}^{N} c_n e^{inx}$

The relation between the coefficients comes out as $a_n = c_n + c_{-n}$ and $b_n = i (c_n - c_{-n}^)$ and this implies that, if the function being transformed is real, that $c_{-n} = c^*_n$

Anyways, so now we have re-written our completely normal real-valued series that had completely real-valued coefficients as a complex series, but who's coefficients follow a constraint $c_{-n} = c^*_n$.
(Of course this constraint is only true if the function being fourier-transformed doesn't have complex values)

It turns out that the complex representation is easier to compute and the people working with them either understand that in the real-valued case this is just a re-writing of a sin/cos based real series, or they convinced themselves that it's not worth thinking about it...

user26241 · Answer

You are talking about the Fourier transform of a signal.  A transform is a different domain than what you started with.  You could equally well ask whether it makes sense to represent resistance with color-coded rings.

Sense does not play into it.  The question is whether the representation captures the relevant information, and whether the change of domain has advantages offsetting the effort of the change.

Yes, you need complex numbers.  But frankly: try using a Hartley transform instead (which transforms real functions into real functions) and you'll find that the algorithmic complications increase while its descriptive powers decrease, and the descriptive powers are what caused us to start the transform in the first place.  In terms of real-valued calculations, a Hartley transform tends to shave off a few operations over a Fourier transform.  But indexing and description get more icky compared to just working in the complex domain.  So they never became overly fashionable.

Laurent Duval · Answer

Sometimes, adding a dimension to signals, functions, make them, and the results associated to,  more general, and more practical to handle. Think about polynomials: if you stick to real roots, their number can vary a lot. If you use complex roots, any polynomial of degree $d$ has exactly $d$ roots (including multiple ones).

In a more mundane manner, I consider the use of complex numbers (under some conditions) as considering the object (real) together with its shadow (the imaginary part). Although the shadow can be recovered from the sun position, taking altogether the object and its shadow let you forget about the sun. Take a cosine signal for instance. It sounds natural to consider its amplitude to be the max value $1$. But you can see easily see that the cosine varies in absolute amplitude between $0$ and $1$, although morally it "should" be 1. Add the (analytical) shadow of the cosine, which is the sine (they are called Hilbert pairs). You get $e^{ix}$, whose amplitude (modulus) is always $1$, so much simpler. So from a real audio input, you can build a complex signal that make sense, using physical properties. In seismic for instance, the polarization of waves is important. Going complex can be useful. But for practical reasons, one could built a complex audio signal from the left/right pairs.

So:

does this make sense? Triple yes, the complex can reveal the true nature of signals
If yes, then how can we interpret this? As a guide and an help
Isn’t the value of the input supposed to represent change in air pressure? The input remains real, or could be only the real part, or something more involved. Could be the pressure plus its imaginary shadow. The output of the Fourier transform can be complex, no big deal
So if for an input signal $x$, say $x[0]$ was $2+3jmath $, what would this tell us about the pressure on the diaphragm at that point of time? You possibly enter a land of confusion here: you could be talking about $X[0]$, from the Fourier transform. It happens that for real signals, $X[0]$ is always real, it represents the average of the signal. But some other $X[n]$ could be complex. Their phase could tell you interesting details about your signals, not generally not at a specific time index, on the whole interval range

hotpaw2 · Answer

Another way to look at it is that a complex value is just two real values that behave in (close enough to) a certain relationship during some physical behavior, that relationship being that the two real values change together in a manner similar enough that they can be modeled by formulas that use complex multiplication and addition when treating those pairs of real input values as single complex values.

Such two real values might be "instantaneous" air pressure and velocity, for which it may be possible to model their paired behavior in some situations as one complex value, thus requiring less lines of equations in the textbook or on the chalkboard.

Marcus Müller · Answer

So it seems like real-world (discrete) audio signal might have complex values when being represented digitally,

No, you misunderstood that. The discrete audio time signal doesn't have non-real values. The Fourier transform can have such.

but does this make sense?

It doesn't need to make sense. It's just math. I sometimes remind myself of that – it helps to occasionally get some distance from the very human desire to put sense into things.

However, assume any signal has Fourier transform $S(f)$, and that transform is purely real. Now you shift that signal in time by $Delta t$. From knowing the Fourier transform properties, or by just applying the integral (continuous FT) or the sum (discrete FT), you instantly see that this means multiplication with a $e^{j2pi Delta t f}$. In the general case, this produces non-zero imaginary parts where there were none before.

In other words: even if you have a signal whose FT is purely real, simply by shifting it in time a bit, you can always get a non-zero imaginary part in the spectrum. That applies to all signals, be it audio, or I/Q baseband of a radio receiver.

If yes, then how can we interpret this?

I think it makes sense for you to just continue to follow these lectures, I don't want to spoil too much, but:

only time-symmetrical (to be exact, time-hermitian, but since audio signals are real, that's identical to symmetry) signals have a real spectrum. If you see a non-real spectrum, you can simply tell the signal wasn't hermitian (symmetric).

Also, it tells you that if you want to deal with the spectra of signals, you always have to consider both, amplitude and phase, to represent a signal. And neither of these two aspects can be defined on the real part of the spectrum alone! The first is the root of the sum of squares of the real and the imaginary part (so the imaginary part is important), and the second is the arcustangens of the ratio of imaginary and real part. Phase "contains" the time-shift, measured in full signal periods, for harmonic signals, by the way, which is a class of signals very important for Audio.

Isn’t the value of the input supposed to represent change in air pressure?

yes. But that the time-domain signal, not the frequency-domain signal.

So if for an input signal x, say x[0] was 2 + 3j, what would this tell us about the pressure on the diaphragm at that point of time?

You're confusing a real time-domain signal with its Fourier transform. A single scalar value, like the momentary air pressure, is real. there's no $+3j$; there can't be, in time domain. However, in frequency domain, there might be an imaginary part.

Make sure that you understand this:

A signal is just a changing value over time, or some other axis.
The most intuitive representation hence is one along that axis, for example, an audio signal might be a change in pressure over time.

However, there's other ways of expressing the same signal. The Fourier Transform of a signal describes the same signal, but is a different function than the time signal. We often speak of "frequency domain" (after FT) and "time domain".

I don't know your background, but early on in my studies, we were taught of vector spaces, bases and base transforms.

"Time steps" is just one basis with which you can describe a signal; "Fourier Coefficients" is another one. The same signal has different coefficients (ie. representations as vectors of numbers) regarding these two bases. And the discrete FT is really, really, really nothing but a plain, old, boring base transform matrix with a few nice properties!

Does it make sense to have complex numbers representing real-world audio signals?

6 Answers

Add your own answers!

Ask a Question