Skip to content

fix: support batched inference in solve_euler by removing hardcoded batch size#1827

Open
Mr-Neutr0n wants to merge 1 commit intoFunAudioLLM:mainfrom
Mr-Neutr0n:fix/solve-euler-batch-size
Open

fix: support batched inference in solve_euler by removing hardcoded batch size#1827
Mr-Neutr0n wants to merge 1 commit intoFunAudioLLM:mainfrom
Mr-Neutr0n:fix/solve-euler-batch-size

Conversation

@Mr-Neutr0n
Copy link

Summary

The solve_euler() method in cosyvoice/flow/flow_matching.py hardcodes the batch dimension to 2 when allocating CFG (Classifier-Free Guidance) tensors. This assumes a single-sample batch (1 conditional + 1 unconditional = 2), which means batched inference with multiple samples silently produces incorrect results — only the first sample's conditioning is used.

What changed:

  • Replaced hardcoded 2 with 2 * bsz (where bsz = x.size(0)) in all CFG tensor allocations (x_in, mask_in, mu_in, t_in, spks_in, cond_in)
  • Updated the indexing from [0]/[1] to [:bsz]/[bsz:] so that all samples in the batch get their conditioning applied correctly
  • The unconditional half ([bsz:]) remains zero-initialized as before, which is the expected behavior for CFG

This fix is backward-compatible: for single-sample inference (bsz=1), the behavior is identical to the original code.

Test plan

  • Verify single-sample inference produces identical results to before the change
  • Test batched inference with bsz > 1 and confirm all samples receive correct conditioning
  • Check that CFG guidance scaling works correctly across the full batch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant