Skip to content

Conversation

@singalsu
Copy link
Collaborator

No description provided.

The direct-form I (DF1) is compatible with direct-form-transposed
(DF2T). The filter type is changed since DF1 is better potential
for optimization for SIMD.

In a HiFi5 platform this change saves with two band crossover
0.8 MCPS, from 10.02 MCPS to 9.26 MCPS. The saving will be higher
in higher order filter banks such as in multiband DRC component.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The direct-form I (DF1) is compatible with direct-form-transposed
(DF2T). The filter type is changed since DF1 is better potential
for optimization for SIMD.

In a build for a HiFi5 platform this and previous patch
for crossover filterbank gives with three bands DRC a saving
of 6.1 MCPS, from 96.5 MCPS to 90.4 MCPS.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
@singalsu singalsu marked this pull request as ready for review January 31, 2025 17:01
@singalsu
Copy link
Collaborator Author

singalsu commented Feb 3, 2025

Note: The optimization will continue with the IIR DF1 core. I'm thinking to add a conversion for the coefficients blob to 128 bits load compatible to make it more efficient. The blobs in user space would remain the same.

Copy link
Contributor

@johnylin76 johnylin76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@singalsu
Copy link
Collaborator Author

singalsu commented Feb 3, 2025

Note: The optimization will continue with the IIR DF1 core. I'm thinking to add a conversion for the coefficients blob to 128 bits load compatible to make it more efficient. The blobs in user space would remain the same.

Here's the top 10 results from profiling before/after. The IIR still remains the top function:

Pipeline MCPS:  96.50

Flat profile:
                                           self      total          
       cumulative       self             cycles     cycles          
  %        cycles     cycles    calls     /call      /call  name    
             (K)        (K)                (K)        (K)           
39.01     54861.72   54861.72  1096720      0.05       0.05  iir_df2t
10.12     69090.62   14228.90     1430      9.95      94.57  multiband_drc_s32_default
 7.51     79653.12   10562.50     6426      1.64       2.71  drc_update_detector_average
 7.42     90084.45   10431.32     6429      1.62       2.65  drc_compress_output
 7.28    100327.70   10243.26   205635      0.05       0.24  multiband_drc_s32_process_drc
 5.41    107936.20    7608.49    68545      0.11       0.91  multiband_drc_process_emp_crossover
 4.97    114927.79    6991.59   137090      0.05       0.35  crossover_generic_split_3way
 4.68    121511.08    6583.30   205728      0.03       0.03  sofm_lut_sin_fixed_16b
 3.56    126514.62    5003.54    35834      0.14       0.14  sofm_exp_int32
 2.24    129670.89    3156.27   139259      0.02       0.02  memcpy

Pipeline MCPS:  90.36

Flat profile:
                                           self      total          
       cumulative       self             cycles     cycles          
  %        cycles     cycles    calls     /call      /call  name    
             (K)        (K)                (K)        (K)           
34.95     46087.96   46087.96  1096720      0.04       0.04  iir_df1
10.79     60316.86   14228.90     1430      9.95      88.44  multiband_drc_s32_default
 8.01     70879.36   10562.50     6426      1.64       2.71  drc_update_detector_average
 7.91     81310.68   10431.32     6429      1.62       2.65  drc_compress_output
 7.77     91553.94   10243.26   205635      0.05       0.24  multiband_drc_s32_process_drc
 5.77     99162.44    7608.49    68545      0.11       0.80  multiband_drc_process_emp_crossover
 5.30    106154.03    6991.59   137090      0.05       0.30  crossover_generic_split_3way
 4.99    112737.32    6583.30   205728      0.03       0.03  sofm_lut_sin_fixed_16b
 3.79    117740.86    5003.54    35834      0.14       0.14  sofm_exp_int32
 2.39    120897.13    3156.27   139259      0.02       0.02  memcpy

This was done with script run scripts/sof-testbench-helper.sh -x -m drc_multiband -p profile-drc32_multiband.txt

Edit: Next step is simplified IIR core for crossover_generic_process_lr4() function, the checks and outer loop can be removed for a fixed 4th order (2 biquads) calculate.

@kv2019i kv2019i merged commit f3ac6ed into thesofproject:main Feb 3, 2025
44 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants