Skip to content

Feature extraction corrections #1

Open
bmascaro wants to merge 24 commits intomainfrom
beats_prediction_beforePF
Open

Feature extraction corrections #1
bmascaro wants to merge 24 commits intomainfrom
beats_prediction_beforePF

Conversation

@bmascaro
Copy link
Copy Markdown
Collaborator

@bmascaro bmascaro commented Mar 7, 2026

Major corrections to align c++ output data with BeatNet Python outputs

  • Framed Signal

  • Spectrum

  • Log Spectrogram

  • Spectrogram difference

Copy link
Copy Markdown
Owner

@pasquale90 pasquale90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that need to be made first to get consistent results. Once these are done, we can then test again to see if the C++ and Python results align. If not, we may need another round of adjustments.

Comment thread onnx/framedSignal.cpp
Comment on lines +11 to +36
{
int nMax = ((nFrames -1) * hopSize) + frameSize;
padded_signal.assign(nMax, 0.0f);

{
auto s0 = original_signal.begin();
auto sEnd = original_signal.end();
auto destination = padded_signal.begin() + frameSize / 2;

int i = frameSize / 2;

std::copy_if(s0, sEnd, destination,
[&i, nMax](float x)
{
return i++ < nMax;
});
}

for (int iFrame = 0, index = 0; iFrame < nFrames; iFrame++, index += hopSize)
{
auto i0 = padded_signal.begin() + index;

std::vector<float> signal(i0, i0 + frameSize);
frames.push_back(signal);
}
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing the framing logic on the constructor results in the inputSignal variable being lost, because the object of the class is simply destructed after the scope of the caller (the audio callback) reach to the end of execution. The only way to maintain the information of it would be to declare inputSignal as a static object in the header, but that s not the best design IMO for that particular implementation.

This logic should be transferred to a process function that collects input buffers in each call, and utilizes them to create the frames that will feed the model. Take a look here

bool process(const std::vector<float>& input, std::vector<float>& frame_out);

There's an input buffer coming in, in each call, and an output frame coming out, while the function returns true if a valid frame is produced (simply because the first buffers wont be enough in length to produce a full frame)

Comment thread onnx/BeatNet.cpp

// slice original signal to Frames
const int nFrames = 4;
FramedSignal framedSignal{ resampledSignal , nFrames, FRAME_LENGTH, HOP_SIZE };
Copy link
Copy Markdown
Owner

@pasquale90 pasquale90 Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FramedSignal framedSignal should be declared on the header first. This will create an member object of the BeatNet class. Declaring it here does not make sense for many reasons.

At first, because the framedSignal object resides in the scope of the function, which means that its lifetime has automatic duration and at the end of each function call, the object gets destroyed and defined in the next call. So the concept of accumulating buffers for creating frames to feed the model, is in this way violated.

You should declare the object on the header, initialize it on the BeatNet constructor, and then call framedSignal.process() function from withing preprocess function.

Comment thread onnx/BeatNet.cpp
Comment on lines -142 to -145
if (!valid_frame) {
// std::cout<<"invalid frame and will be invalid for the first ~"<<FRAME_LENGTH/resampled.size()-1<<" frames"<<std::endl;
return false;
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably keep this, to make sure that during the first calls of the function, where the first frame that is currently under formation while collecting the first buffers, will return false, aborting the inference of the model. Simply put, in such case, there is not yet a valid input signal to pass to the model.

Comment thread onnx/framedSignal.h
Comment on lines +13 to +14
std::vector<float> getOriginalSignal();
int get_nFrames();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are not used. Should they be removed?

Comment thread onnx/BeatNet.cpp
std::vector<float> resampledSignal = resampler.resample(raw_input);

// slice original signal to Frames
const int nFrames = 4;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer defining hyperparameters outside functions. Especially since this is a fixed value, you can either define it using #define ( i.e. #define FRAMED_SIGNAL_NFRAMES 4) in BeatNet.h along with the rest of them, or either in the top of the framedSignal.h. I believe that would make things clearer and more maintainable in case of you need to experiment with its value in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants