TransWikia.com

Advantage of applying a Window Function in Analysis and Synthesis of STFT?

Signal Processing Asked by micropyre on October 24, 2021

I’m studying Short-time Fourier Transform and I learned that if you apply a windowing other than rectangular, you need to overlap the frames—for example, 50% overlap for Hamming family—to reconstruct the signal properly (like add them up to 1). However, I only need to apply this window function during the Analysis part and I can reconstruct the signal properly during synthesis.

But in some literature (like Audacity’s Noise Reduction) I saw some methods that apply a second window function during the synthesis part and employ a 75% overlap-add. What’s the advantage of doing the window function twice (analysis/synthesis) with 75% overlap compared to a single window function (analysis) with 50% overlap? I can’t seem to find any explanation justifying the advantage between these two windowing techniques.

Edit: I also tried to implement the 75% overlap with Hann window applied on both analysis and synthesis part and I was able to yield a x1.5 of the original signal’s amplitude. Did I do it correctly?

2 Answers

Your question seems to be about the purpose of overlapping during synthesis in addition to overlapping during analysis.

In analysis, as perhaps you know, the overlap allows better capturing of temporal effects that span consecutive analysis blocks.

For instance, rapid transitions in low pass systems have long time domain responses, and how that response splits between subsequent analysis blocks affects how each block represents it spectrally.

If the transformed signal is not processed, and the synthesis is performed by simply inverting each block individually and stitching the blocks back together, then the time domain signal will be perfectly recovered and perfectly represent the original (quantization aside). This is regardless any effects that may span multiple blocks, or how the blocks align with effects in the signal.

However, if the transformed signal is processed in the frequency domain, and independently on consecutive blocks, then the effect of a rapid transition on the processing is different depending on where in the block this transition happens.

But it shouldn't be: an artifact in the signal (in the sound) should be identified, extracted and processed regardless how the artifact lines up with the block timing, right?

To somewhat ameliorate these unwanted block effects during analysis, blocks are overlapped and windowed.

The same goes for synthesis. When processing a signal block-by-block in a transformed domain (e.g. in the frequency domain) the result of the detection and transform in one block may be different from the result in another block.

And if, over time, there is a block-by-block change in the processing (e.g. a gain or a tuning parameter is adapted based on a frequency analysis), we would want this adjustment to be gradual in the output, and not audible/discernible at block boundaries.

To prevent discontinuities at block boundaries, after processing, you would overlap the blocks at synthesis.

If you listen carefully to "auto-tuned" singers with poor pitch, you can sometimes hear the block based chopping of the sound. This is usually prevented with sufficient overlap during analysis (to detect and correct) and during synthesis (after re-tuning).

The same goes for speeding-up video without raising the audio pitch. It is the overlap during synthesis that prevents block based chopping. Usually it works well, but for too much speed up you might find that the chosen overlap is not sufficient to prevent these artifacts from being noticed.

(You can observe this out on youtube with 1.75x playback speed and classical music with drawn out notes)

Answered by P2000 on October 24, 2021

The STFT is an instance of an Analysis/Synthesis system. If we restrict to windows of finite length, it amounts to windowing blocks of length $L$, taking the DFT, and hopping by $K$ samples to perform the same operation. An A/S system is said invertible, or perfect, if you can recover the original signal from the blocks of DFT coefficients.

If $K>L$, some samples are missed, there are holes between blocks, and generally some information is lost (unless you have additional information).

If $K=L$, blocks are contiguous, and do not overlap. Here, the transformation is critical, or non-redundant: you have the same numbers of Fourier coefficients than of samples. In that case, you can invert it even with non-rectangular windows. It suffices that the windows don't vanish, because windows without zero-values can be inverted (and Fourier as well). There is only one inverse in that case. This is for your first question.

If you allow overlap $K<L$, then several things may happen. I shall come back later, but basically:

  • a proper synthesis window can make a system with a vanishing analysis window invertible;
  • it can reduce artifacts resulting from performing processing after the analysis stage (frequency removal, thresholding etc.);

In the continuous time domain, if you have an analyzing window $h(t)$, for any synthesis window such that

$$ int_{-infty}^{+infty}g(t)h^*(t)mathrm{d}t = 1$$

then the continuous STFT is invertible. Several observations:

  • These are very mild conditions to satisfy. This should not be surprizing, the STFT is extremely redundant, so the information is everywhere in the time-frequency plane
  • yes, you need a second window in the synthesis time-frequency plane (but it can be the same)
  • The windows $h$ and $g$ are complementary in second-order. Many possible $g$ exists, and especially:
  • $g=h$, provided that $ int_{-infty}^{+infty}h(t)h^*(t)mathrm{d}t = 1$ which is easy if the window does not have zero energy.

In the discrete-time/discrete-frequency domain, this is far more complicated in the general case. Yet, for power-of-two frame length, for certain window (triangle, Hann) and overlap ($1/4$, $1/2$, $3/4$), there exist direct inverses, easily computed from the analysis window. One shall compensate a bit in amplitude (like in your case), by a factor related to overlap and redundancy. Those cases are often standard, as they are quite easy.

However, there are generalized inverses, that allow for less redundant decompositions, at a cheap prices.

Related answers:

Answered by Laurent Duval on October 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP