-
Notifications
You must be signed in to change notification settings - Fork 349
Audio: STFT Process: Add new SOF module #10306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ae6f2f9 to
1b25adc
Compare
|
Improved audio quality with 32 bit window functions and switch to Hann window type. |
|
@singalsu can you also add a Readme.md to the module directory describing usage/tuning. Thanks ! |
1b25adc to
355f15e
Compare
| #define FFT_MULTI_TWIDDLE_SIZE 2048 | ||
|
|
||
| /* in Q1.31, generated from cos(i * 2 * pi / FFT_SIZE_MAX) */ | ||
| const int32_t multi_twiddle_real_32[FFT_MULTI_TWIDDLE_SIZE] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this cold data with needed parts (depends on FFT size) copy to fast RAM.
355f15e to
8ad9eeb
Compare
8ad9eeb to
a7407a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new STFT (Short-Time Fourier Transform) processing module for SOF, including comprehensive test infrastructure for FFT operations and configuration files for audio topology.
- Adds UUID registration and module initialization for the new
stft_processcomponent - Implements topology configuration files for STFT processing with different window sizes (1024x256 and 1536x240)
- Adds extensive FFT/IFFT test infrastructure with reference data for multiple FFT sizes
- Updates build configuration to include the new STFT process module
Reviewed changes
Copilot reviewed 56 out of 59 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| uuid-registry.txt | Registers UUIDs for math_fft and stft_process modules |
| tools/topology/topology2/include/components/stft_process/*.conf | Configuration files for STFT processing with Hann window parameters |
| tools/topology/topology2/include/bench/stft_process*.conf | Benchmark configuration files for STFT processing with different sample formats |
| tools/topology/topology2/development/tplg-targets-bench.cmake | Build system updates to include STFT process module targets |
| tools/topology/topology2/cavs-benchmark-*.conf | Integration of STFT process into benchmark topologies |
| tools/testbench/utils_ipc4.c | Adds STFT process module initialization |
| tools/rimage/config/tgl*.toml | Module configuration for STFT process component |
| test/cmocka/src/math/window/window.c | Updates window function constant name |
| test/cmocka/src/math/fft/ref_*.h | Reference test data for FFT/IFFT operations across multiple sizes |
| test/cmocka/src/math/fft/*.c | Test implementations for FFT, IFFT, and DFT3 operations |
| test/cmocka/src/math/fft/*.m | MATLAB/Octave scripts for generating FFT test reference data |
| src/math/fft/tune/README.md | Documentation for generating twiddle factors |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a7407a0 to
9f6246f
Compare
| cfg.common_path = [cfg.tools_path 'tune/common']; | ||
| cfg.tplg_ver = 2; | ||
| cfg.ipc_ver = 4; | ||
| cfg.channel = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add description of channel, e.g. 0 for process all channels, 1 .. max channels, select a mono channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update, currently this is ignored but could be supported later if e.g. want to process one selected channel as mono from stereo stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth a comment, unless you can have all the documentation in place for next release. Can be done incrementally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will update the configuration blobs to support the formats the frequency domain models need. This likely still changes. Also I'm not sure if the models are multi-channel. It may need also stereo to mono conversion and maybe some double mono output option.
|
|
||
| /* set up the bit reverse index */ | ||
| for (i = 1; i < size; ++i) | ||
| bit_reverse_idx[i] = (bit_reverse_idx[i >> 1] >> 1) | ((i & 1) << (len - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what this is meant to do, but it isn't using initial bit_reverse_idx[] values from the second half - above len / 2 - is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It reverses order of bits, e.g. 0b10100000 would become 0b00000101. I didn't invent that, it was in original code and I have better code targets to improve for performance. This isn't a hot code part.
Line 61 in 0855423
| plan->bit_reverse_idx[i] = (plan->bit_reverse_idx[i >> 1] >> 1) | |
What code snippet do you suggest to replace this? I can test if you propose something improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good now, thanks for an offline explanation! A comment, explaining that bit_reverse_idx[i] contains i with its bits reversed would help, but maybe it's obvious to most
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, true it doesn't hurt to add it.
| int fft_padded_size; | ||
| int fft_hop_size; | ||
| int fft_buf_size; | ||
| int half_fft_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned int or size_t if in bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not count of bytes but count of samples. Could use "length" as well.
I don't get the C experts' point of favoring unsigned when possible. The int numbers range is sufficient and it's always safe to combine with arithmetic while unsigned types aren't. The computer ALU is signed 2's complement and there is no native unsigned computation in them. Unsigned is good for bitfields in my opinion. If incrementing, decrementing, comparing (=subtract), the CPU computes it as signed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@singalsu hm, interesting, thanks. Being obviously very respectful of Mr. Stroustrup, I see some of his points, but I still couldn't quite understand all of them. He's saying, that mixing signed and unsigned types is error-prone. Of course it is. That's why it either shouldn't be done or only while carefully checking values. E.g. he's giving an example of auto a = area(height1-height2, length1-length2); which can bite if height1 < height2. Of course it can. But that would happen regardless of the type. You have to check ranges regardless of the type. Your check would just look different and with unsigned I find it would look more natural. E.g. if you know that your valid values are nonnegative and below a MAX_VAL, with unsigned you just check x < MAX_VAL while with signed you also have to check for nonnegative. He's emphasising that "unsigned is nonnegative with modular arithmetic". Sure, but isn't signed int the same and maybe even more confusingly so? Where you can add two large positive numbers and get a negative one? So how are signed integers "always safe to combine with arithmeti?" And no, not "when possible," but when makes sense. Also - I think more importantly - using unsigned for variables which should never be negative also serves as documentation.
This isn't critical, I'm certainly not even attempting to make this a blocker issue, having a variable of a signed type tells me that -1 is a valid value for it, while in your use cases it supposedly isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will think more the config blob format, there the e.g. channels might be better a channels mask as unsigned to select a specific mic channel for mono processing. I'm right now adding multiple channels processing to this. It could go to this PR or later as incremental. But these FFT sizes (math literature talks about size, but should I say length?) are used in equations, so int is natural for them.
| int32_t *window; /**< fft_size */ | ||
| int source_channel; | ||
| int prev_data_size; | ||
| int sample_rate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also unsigned?
| size_t frame_bytes; | ||
| int source_channel; | ||
| int max_frames; | ||
| int channels; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good @singalsu ! Lot of helpful comments in the code, making this easier to follow. Nothing major in the review. It seems you have some todos left in code and commits, so I gather you will be updating this series still.
src/math/fft/fft_common.c
Outdated
| //for (j = 0; j < plan->num_ffts; j++) | ||
| // for (i = 0; i < plan->fft_size; i++) | ||
| // fprintf(fh1, "%d %d\n", | ||
| // plan->tmp_o32[j][i].real, plan->tmp_o32[j][i].imag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a "#ifdef DEBUG_DUMP_TO_FILE" or some such and keeping these in code is ok (and better than just having these commented out).
9f6246f to
151059e
Compare
|
New version, with stereo processing. It was dual mono for 2ch. The review comments are addressed abd I hope I didn't miss essential. I don't agree with unsigned int usage preference for data sizes (as frames or samples) and channels count, so I didn't change those. |
Still missing from this as functionality the log magnitude spectrum domain between FFT and IFFT. And the all the twiddle factors can be made cold. It can be incremental if this is merged, or I can keep updating this. |
|
@singalsu can you check CI, one build failure for plugin. Thanks ! |
There's a mess with 16 bit FFT build triggered by MFCC but needing 32 bit FFT parts. I need to separate better the 16 bit and 32 bit FFT stuff for build success. |
151059e to
a4c6d32
Compare
Need to fix the cmocka unit test fail that I caused with the kconfig changes. |
a4c6d32 to
38c085e
Compare
|
|
||
| /* step 1: re-arrange input in bit reverse order, and shrink the level to avoid overflow */ | ||
| inu = AE_LA64_PP(inx); | ||
| for (i = 1; i < size; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to myself: Need to fix this also for 16 bit HiFi FFT. Currently there is no test for it. Need to check translating the tests to ztest with xt-xcc or xt-clang build.
|
|
||
| /* set up the bit reverse index */ | ||
| for (i = 1; i < size; ++i) | ||
| bit_reverse_idx[i] = (bit_reverse_idx[i >> 1] >> 1) | ((i & 1) << (len - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, true it doesn't hurt to add it.
| cfg.common_path = [cfg.tools_path 'tune/common']; | ||
| cfg.tplg_ver = 2; | ||
| cfg.ipc_ver = 4; | ||
| cfg.channel = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will update the configuration blobs to support the formats the frequency domain models need. This likely still changes. Also I'm not sure if the models are multi-channel. It may need also stereo to mono conversion and maybe some double mono output option.
| STFT_PAD_START = 2, | ||
| }; | ||
|
|
||
| enum sof_stft_process_fft_window_type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the blob that is embedded to topology sets all the STFT parameters like window type and length.
| int16_t frame_shift; /**< samples, e.g. 160 for 10 ms @ 16 kHz */ | ||
| int16_t reserved_16; | ||
| enum sof_stft_process_fft_pad_type pad; /**< Use PAD_END, PAD_CENTER, PAD_START */ | ||
| enum sof_stft_process_fft_window_type window; /**< Use RECTANGULAR_WINDOW, etc. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's an ABI. Enum is same size as int. Should I use e.g. int16_t and type cast it to the nicer looking enums when applied? I'd like to do it incrementally since these parameters will still change/add/delete when the models to be used with this are known better.
38c085e to
de72839
Compare
The module versions of the functions mod_fft_plan_new() and mod_fft_plan_free() should be used instead. Since all previous usage of the functions has been updated these are safe to remove. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The bit reverse order for index zero is the same but the scaling and copy can't be bypassed for inb[0]. The for loop needs to start from index zero. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
de72839 to
fea4ba4
Compare
This patch contains fixes and improvements for the FFT implementation. The aligned loads and stores can't be used for generic pointer arithmetic jumps, the pointer increment/decrement must be done by the instruction. Therefore the aligned operations are converted to non-aligned with requirement to align for 64 bits the FFT input and output buffers. The bit reverse ordering of data must start from zero, the start from one looks like a mistake. The change improves the accuracy when compared to reference FFT in Octave. In IFFT the output needs to be made complex conjugate. It was missing from the last scaling loop. These fixes also saved about 2 MCPS in STFT module usage. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch updates the headers document style to preferred style in SOF. The similar partial documentation comments are removed from window.c. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds 32 bit Q1.31 format window functions for better precision in audio processing. The supported window types are rectangular, Blackman and Hamming. The constant define for Blackman is suffixed with Q15 to help avoid using it with new 32 bit window and it's similar constant. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds the Hann window. It provides higher attenuation of side lobes vs. similar family Hamming window. Hann window is suitable for use in STFT overlap-add procedure. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds function mod_fft_multi_plan_new() and fft_multi_execute_32() to execute other than a power of two FFT size. The FFT sizes such 1536 that is 3*512 or 3072 that is 3*1024 are supported by this change. The procedure to compute this consists of doing three power of two FFTs, multiply the outputs with twiddle factors and doing DFT of size for the outputs. The produce is wrapped by the new functions those are used similarly as previous FFT functions. The fft-multi function can be also used for power of two FFTs. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds tests for the new functions with tests fft_multi and dft3. The reference test vectors data is created with Octave scripts ref_fft_multi.m and ref_dft3.m. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
fea4ba4 to
5243312
Compare
This patch adds script to create configuration blobs for STFT process module. A few blobs are created to be used with topology to test STFT. This is WIP. The blob format is not yet final. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds build of topologies to test the STFT process module. The topologies initialize the processing for 512, 768, 1024 and 1536 size FFTs. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This module provides analysis and synthesis filters for frequency domain processing. The used technique is short term Fourier transform (STFT) and inverse STFT. The FFT length, hop, and window type are configured with the bytes control configuration blob. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
5243312 to
bb5e20a
Compare
|
Added topology & blob with 1 ms FFT hop for LL pipeline processing example. |
|
@singalsu I've merged since functional CI has passed but pls do fix the doxygen warnings in incremental PR |
Yep, I'll do right now a small PR for this. It takes still a while before the next cold twiddle factors data PR is ready. |

No description provided.