Reconstructing audio from a given spectrogram and phase data

Question

I have 2 seperate files called: spectrogram and phases. Spectrogram contains magnitude information of audio and phases contain the phase information of each windowed area. I need to reconstruct the audio using these information yet i am completely lost. Can you put me into the right direction. How can i do this?

Laurent Duval · Answer

A spectrogram is generally an overcomplete and complex representation of a real signal. Overcomplete means that you get more "independant" complex coefficients than samples in the signal. This context of redundancy tells us that there are different ways to reconstruct a signal from spectrogram  magnitude and phase information.
Let us imagine that windows have even length $2kappa$. A spectrogram of a real signal should be composed each of $N_kappa$ frames of $2kappa$ complex samples. To keep all information on the signal of length $N$, we should have $Nle 2kappa times N_kappa$
With Hermitian symmetries, this can be shrunk to $(kappa+1) times N_kappa$ magnitude and phase information. Normally, the first and the last coefficients should be real. There, you can check whether the sizes of your magnitude/phase files fit. In other words, if to can spot zero-valued phases in your phase file, you will have hints on how the spectrogram was obtained (window length and overlap). The overlap is very often of 50%, 75%  or 25%. Another clue to check with respect to the original signal size.
Once you have the window length (that could be in the header of your binary files), you can unveil the way the original signal was chunked into overlapping pieces.
But you still don't known how samples are weighed, eg the shape of the overlapping window that you have to invert. Good news:

first: using  a Hamming or Hann window is quite traditional. So if you don't have the  information on the window shape, they are safe bets.
second: being overcomplete, there are many potential faithful inverses. So even if you don't know the exact window shape, make a guess, and the overcomplete or redundant complex coefficients can compenate each other  to recover a nice looking signal.
third: as many inverses are possible, one can seek for an optimized  inverse with perfect signal or noise removal. A shameless reference? Of course:
Optimization of Synthesis Oversampled Complex Filter Banks

user51411 · Answer

In general terms, you can convert the magnitudes and phases into complex numbers. See this reference. Once that's done, you can take the Inverse Discrete Fourier Transform to attempt to recover the audio.
You haven't indicated weather or not you know how these files were made so there is no guarantee that applying such a direct method will work. In the absence of such information, it doesn't cost anything to try.
I hope this helps.

P2000 · Answer

A spectrogram is a graphical time-scroll representation of the frequency components in a signal. Sometimes it's called a "waterfall", because that's what it looks like as it is computed and plotted in real time. The core math behind this is the FFT.
A spectrogram generally shows the strength (or magnitude) of a range of frequency components in a signal (e.g. DC to 20kHz). The FFT has a complex output, one block of values for each block of input. The magnitude of these complex numbers is likely (somehow) stored in the "spectrogram" file, and the phase is stored in the "phases" file.
The trick to reconstructing the original signal from a spectrogram is to inverse the spectrum, and the math behind that is the IFFT.
You will need the magnitude and the phase data from the files to recreate the complex spectrum and then apply it to the IFFT to get the time domain signal back.
You will have to find out

what math was applied to calculate the spectrogram: FFT block size, any log-scale (dB?), any windowing, any sub-sampling etc... and how the data is formatted and organized in the file.
how the phase is calculated and stored (block size, degrees/radians, number format etc...)

As a commenter pointed out, you will have to find out all these details from whoever created the file. But beware that even then it might not be possible to recreate the original signal.
In that case, if you can specify how to generate the spectrogram with different parameters (no windowing, no sub-sampling etc...), and then (re)generate these files, you may stand a chance.
Good luck. Since this is a bit like forensics, you will need it.

Reconstructing audio from a given spectrogram and phase data

3 Answers

Add your own answers!

Ask a Question