Skip to content

Conversation

@singalsu
Copy link
Collaborator

@singalsu singalsu commented Oct 15, 2025

No description provided.

@singalsu singalsu force-pushed the stft_process branch 2 times, most recently from ae6f2f9 to 1b25adc Compare October 21, 2025 17:00
@singalsu
Copy link
Collaborator Author

Improved audio quality with 32 bit window functions and switch to Hann window type.

@lgirdwood
Copy link
Member

@singalsu can you also add a Readme.md to the module directory describing usage/tuning. Thanks !

#define FFT_MULTI_TWIDDLE_SIZE 2048

/* in Q1.31, generated from cos(i * 2 * pi / FFT_SIZE_MAX) */
const int32_t multi_twiddle_real_32[FFT_MULTI_TWIDDLE_SIZE] = {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this cold data with needed parts (depends on FFT size) copy to fast RAM.

@singalsu
Copy link
Collaborator Author

singalsu commented Dec 5, 2025

Update: The audio quality is now good, transparent for 16 bit audio like the CD format. This is the output from running test process_test('stft_process_1536_240_', 32, 32, 48000, 1, 1);

Screenshot from 2025-12-05 15-32-40

@singalsu singalsu marked this pull request as ready for review December 5, 2025 13:37
@singalsu singalsu requested a review from ranj063 as a code owner December 5, 2025 13:37
Copilot AI review requested due to automatic review settings December 5, 2025 13:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new STFT (Short-Time Fourier Transform) processing module for SOF, including comprehensive test infrastructure for FFT operations and configuration files for audio topology.

  • Adds UUID registration and module initialization for the new stft_process component
  • Implements topology configuration files for STFT processing with different window sizes (1024x256 and 1536x240)
  • Adds extensive FFT/IFFT test infrastructure with reference data for multiple FFT sizes
  • Updates build configuration to include the new STFT process module

Reviewed changes

Copilot reviewed 56 out of 59 changed files in this pull request and generated no comments.

Show a summary per file
File Description
uuid-registry.txt Registers UUIDs for math_fft and stft_process modules
tools/topology/topology2/include/components/stft_process/*.conf Configuration files for STFT processing with Hann window parameters
tools/topology/topology2/include/bench/stft_process*.conf Benchmark configuration files for STFT processing with different sample formats
tools/topology/topology2/development/tplg-targets-bench.cmake Build system updates to include STFT process module targets
tools/topology/topology2/cavs-benchmark-*.conf Integration of STFT process into benchmark topologies
tools/testbench/utils_ipc4.c Adds STFT process module initialization
tools/rimage/config/tgl*.toml Module configuration for STFT process component
test/cmocka/src/math/window/window.c Updates window function constant name
test/cmocka/src/math/fft/ref_*.h Reference test data for FFT/IFFT operations across multiple sizes
test/cmocka/src/math/fft/*.c Test implementations for FFT, IFFT, and DFT3 operations
test/cmocka/src/math/fft/*.m MATLAB/Octave scripts for generating FFT test reference data
src/math/fft/tune/README.md Documentation for generating twiddle factors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cfg.common_path = [cfg.tools_path 'tune/common'];
cfg.tplg_ver = 2;
cfg.ipc_ver = 4;
cfg.channel = 0;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add description of channel, e.g. 0 for process all channels, 1 .. max channels, select a mono channel

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update, currently this is ignored but could be supported later if e.g. want to process one selected channel as mono from stereo stream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth a comment, unless you can have all the documentation in place for next release. Can be done incrementally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will update the configuration blobs to support the formats the frequency domain models need. This likely still changes. Also I'm not sure if the models are multi-channel. It may need also stereo to mono conversion and maybe some double mono output option.


/* set up the bit reverse index */
for (i = 1; i < size; ++i)
bit_reverse_idx[i] = (bit_reverse_idx[i >> 1] >> 1) | ((i & 1) << (len - 1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this is meant to do, but it isn't using initial bit_reverse_idx[] values from the second half - above len / 2 - is that correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reverses order of bits, e.g. 0b10100000 would become 0b00000101. I didn't invent that, it was in original code and I have better code targets to improve for performance. This isn't a hot code part.

plan->bit_reverse_idx[i] = (plan->bit_reverse_idx[i >> 1] >> 1) |

What code snippet do you suggest to replace this? I can test if you propose something improved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good now, thanks for an offline explanation! A comment, explaining that bit_reverse_idx[i] contains i with its bits reversed would help, but maybe it's obvious to most

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, true it doesn't hurt to add it.

int fft_padded_size;
int fft_hop_size;
int fft_buf_size;
int half_fft_size;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned int or size_t if in bytes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not count of bytes but count of samples. Could use "length" as well.

I don't get the C experts' point of favoring unsigned when possible. The int numbers range is sufficient and it's always safe to combine with arithmetic while unsigned types aren't. The computer ALU is signed 2's complement and there is no native unsigned computation in them. Unsigned is good for bitfields in my opinion. If incrementing, decrementing, comparing (=subtract), the CPU computes it as signed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singalsu hm, interesting, thanks. Being obviously very respectful of Mr. Stroustrup, I see some of his points, but I still couldn't quite understand all of them. He's saying, that mixing signed and unsigned types is error-prone. Of course it is. That's why it either shouldn't be done or only while carefully checking values. E.g. he's giving an example of auto a = area(height1-height2, length1-length2); which can bite if height1 < height2. Of course it can. But that would happen regardless of the type. You have to check ranges regardless of the type. Your check would just look different and with unsigned I find it would look more natural. E.g. if you know that your valid values are nonnegative and below a MAX_VAL, with unsigned you just check x < MAX_VAL while with signed you also have to check for nonnegative. He's emphasising that "unsigned is nonnegative with modular arithmetic". Sure, but isn't signed int the same and maybe even more confusingly so? Where you can add two large positive numbers and get a negative one? So how are signed integers "always safe to combine with arithmeti?" And no, not "when possible," but when makes sense. Also - I think more importantly - using unsigned for variables which should never be negative also serves as documentation.
This isn't critical, I'm certainly not even attempting to make this a blocker issue, having a variable of a signed type tells me that -1 is a valid value for it, while in your use cases it supposedly isn't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think more the config blob format, there the e.g. channels might be better a channels mask as unsigned to select a specific mic channel for mono processing. I'm right now adding multiple channels processing to this. It could go to this PR or later as incremental. But these FFT sizes (math literature talks about size, but should I say length?) are used in equations, so int is natural for them.

int32_t *window; /**< fft_size */
int source_channel;
int prev_data_size;
int sample_rate;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also unsigned?

size_t frame_bytes;
int source_channel;
int max_frames;
int channels;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good @singalsu ! Lot of helpful comments in the code, making this easier to follow. Nothing major in the review. It seems you have some todos left in code and commits, so I gather you will be updating this series still.

//for (j = 0; j < plan->num_ffts; j++)
// for (i = 0; i < plan->fft_size; i++)
// fprintf(fh1, "%d %d\n",
// plan->tmp_o32[j][i].real, plan->tmp_o32[j][i].imag);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a "#ifdef DEBUG_DUMP_TO_FILE" or some such and keeping these in code is ok (and better than just having these commented out).

@singalsu
Copy link
Collaborator Author

New version, with stereo processing. It was dual mono for 2ch. The review comments are addressed abd I hope I didn't miss essential. I don't agree with unsigned int usage preference for data sizes (as frames or samples) and channels count, so I didn't change those.

@singalsu
Copy link
Collaborator Author

Looks very good @singalsu ! Lot of helpful comments in the code, making this easier to follow. Nothing major in the review. It seems you have some todos left in code and commits, so I gather you will be updating this series still.

Still missing from this as functionality the log magnitude spectrum domain between FFT and IFFT. And the all the twiddle factors can be made cold. It can be incremental if this is merged, or I can keep updating this.

@lgirdwood
Copy link
Member

@singalsu can you check CI, one build failure for plugin. Thanks !

@singalsu
Copy link
Collaborator Author

@singalsu can you check CI, one build failure for plugin. Thanks !

There's a mess with 16 bit FFT build triggered by MFCC but needing 32 bit FFT parts. I need to separate better the 16 bit and 32 bit FFT stuff for build success.

@singalsu
Copy link
Collaborator Author

@singalsu can you check CI, one build failure for plugin. Thanks !

There's a mess with 16 bit FFT build triggered by MFCC but needing 32 bit FFT parts. I need to separate better the 16 bit and 32 bit FFT stuff for build success.

Need to fix the cmocka unit test fail that I caused with the kconfig changes.


/* step 1: re-arrange input in bit reverse order, and shrink the level to avoid overflow */
inu = AE_LA64_PP(inx);
for (i = 1; i < size; ++i) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to myself: Need to fix this also for 16 bit HiFi FFT. Currently there is no test for it. Need to check translating the tests to ztest with xt-xcc or xt-clang build.


/* set up the bit reverse index */
for (i = 1; i < size; ++i)
bit_reverse_idx[i] = (bit_reverse_idx[i >> 1] >> 1) | ((i & 1) << (len - 1));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, true it doesn't hurt to add it.

cfg.common_path = [cfg.tools_path 'tune/common'];
cfg.tplg_ver = 2;
cfg.ipc_ver = 4;
cfg.channel = 0;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will update the configuration blobs to support the formats the frequency domain models need. This likely still changes. Also I'm not sure if the models are multi-channel. It may need also stereo to mono conversion and maybe some double mono output option.

STFT_PAD_START = 2,
};

enum sof_stft_process_fft_window_type {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the blob that is embedded to topology sets all the STFT parameters like window type and length.

int16_t frame_shift; /**< samples, e.g. 160 for 10 ms @ 16 kHz */
int16_t reserved_16;
enum sof_stft_process_fft_pad_type pad; /**< Use PAD_END, PAD_CENTER, PAD_START */
enum sof_stft_process_fft_window_type window; /**< Use RECTANGULAR_WINDOW, etc. */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's an ABI. Enum is same size as int. Should I use e.g. int16_t and type cast it to the nicer looking enums when applied? I'd like to do it incrementally since these parameters will still change/add/delete when the models to be used with this are known better.

The module versions of the functions mod_fft_plan_new()
and mod_fft_plan_free() should be used instead. Since all
previous usage of the functions has been updated these are
safe to remove.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The bit reverse order for index zero is the same but the scaling
and copy can't be bypassed for inb[0]. The for loop needs to
start from index zero.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch contains fixes and improvements for the FFT
implementation. The aligned loads and stores can't be
used for generic pointer arithmetic jumps, the pointer
increment/decrement must be done by the instruction.

Therefore the aligned operations are converted to
non-aligned with requirement to align for 64 bits the
FFT input and output buffers.

The bit reverse ordering of data must start from zero,
the start from one looks like a mistake. The change
improves the accuracy when compared to reference FFT in
Octave.

In IFFT the output needs to be made complex conjugate. It
was missing from the last scaling loop.

These fixes also saved about 2 MCPS in STFT module usage.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch updates the headers document style to preferred
style in SOF. The similar partial documentation comments
are removed from window.c.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds 32 bit Q1.31 format window functions for better
precision in audio processing. The supported window types are
rectangular, Blackman and Hamming.

The constant define for Blackman is suffixed with Q15 to help
avoid using it with new 32 bit window and it's similar
constant.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds the Hann window. It provides higher attenuation
of side lobes vs. similar family Hamming window. Hann window
is suitable for use in STFT overlap-add procedure.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds function mod_fft_multi_plan_new() and
fft_multi_execute_32() to execute other than a power of
two FFT size. The FFT sizes such 1536 that is 3*512
or 3072 that is 3*1024 are supported by this change.

The procedure to compute this consists of doing three power
of two FFTs, multiply the outputs with twiddle factors and
doing DFT of size for the outputs. The produce is wrapped
by the new functions those are used similarly as previous
FFT functions. The fft-multi function can be also used for
power of two FFTs.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds tests for the new functions with tests fft_multi
and dft3. The reference test vectors data is created with
Octave scripts ref_fft_multi.m and ref_dft3.m.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds script to create configuration blobs for STFT
process module. A few blobs are created to be used with topology
to test STFT.

This is WIP. The blob format is not yet final.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds build of topologies to test the STFT process
module. The topologies initialize the processing for 512, 768,
1024 and 1536 size FFTs.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This module provides analysis and synthesis filters for
frequency domain processing. The used technique is short term
Fourier transform (STFT) and inverse STFT. The FFT length,
hop, and window type are configured with the bytes control
configuration blob.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
@singalsu
Copy link
Collaborator Author

Added topology & blob with 1 ms FFT hop for LL pipeline processing example.

@lgirdwood lgirdwood merged commit 02ff2c7 into thesofproject:main Dec 22, 2025
41 of 45 checks passed
@lgirdwood
Copy link
Member

@singalsu I've merged since functional CI has passed but pls do fix the doxygen warnings in incremental PR

[0/2] cd /home/runner/work/sof/sof/docbuild && doxygen sof.doxygen
/home/runner/work/sof/sof/src/include/sof/math/fft.h:114: warning: Found unknown command '@x'
/home/runner/work/sof/sof/src/include/sof/math/fft.h:115: warning: Found unknown command '@y'
/home/runner/work/sof/sof/src/include/sof/math/fft.h:114: warning: Found unknown command '@x'
/home/runner/work/sof/sof/src/include/sof/math/fft.h:115: warning: Found unknown command '@y'

@singalsu
Copy link
Collaborator Author

@singalsu I've merged since functional CI has passed but pls do fix the doxygen warnings in incremental PR

[0/2] cd /home/runner/work/sof/sof/docbuild && doxygen sof.doxygen
/home/runner/work/sof/sof/src/include/sof/math/fft.h:114: warning: Found unknown command '@x'
/home/runner/work/sof/sof/src/include/sof/math/fft.h:115: warning: Found unknown command '@y'
/home/runner/work/sof/sof/src/include/sof/math/fft.h:114: warning: Found unknown command '@x'
/home/runner/work/sof/sof/src/include/sof/math/fft.h:115: warning: Found unknown command '@y'

Yep, I'll do right now a small PR for this. It takes still a while before the next cold twiddle factors data PR is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants