Conversation
…; load BeatNet cpp model
…withing results dir
chg vals ch val change
stack features log spectrum and spectral diff newspectral diff change
… (22050 samples/sec) and for current audiowave file
…T_SOURCE_DIR in the download location variable
pasquale90
left a comment
There was a problem hiding this comment.
There are some changes that need to be made first to get consistent results. Once these are done, we can then test again to see if the C++ and Python results align. If not, we may need another round of adjustments.
| { | ||
| int nMax = ((nFrames -1) * hopSize) + frameSize; | ||
| padded_signal.assign(nMax, 0.0f); | ||
|
|
||
| { | ||
| auto s0 = original_signal.begin(); | ||
| auto sEnd = original_signal.end(); | ||
| auto destination = padded_signal.begin() + frameSize / 2; | ||
|
|
||
| int i = frameSize / 2; | ||
|
|
||
| std::copy_if(s0, sEnd, destination, | ||
| [&i, nMax](float x) | ||
| { | ||
| return i++ < nMax; | ||
| }); | ||
| } | ||
|
|
||
| for (int iFrame = 0, index = 0; iFrame < nFrames; iFrame++, index += hopSize) | ||
| { | ||
| auto i0 = padded_signal.begin() + index; | ||
|
|
||
| std::vector<float> signal(i0, i0 + frameSize); | ||
| frames.push_back(signal); | ||
| } | ||
| } |
There was a problem hiding this comment.
Implementing the framing logic on the constructor results in the inputSignal variable being lost, because the object of the class is simply destructed after the scope of the caller (the audio callback) reach to the end of execution. The only way to maintain the information of it would be to declare inputSignal as a static object in the header, but that s not the best design IMO for that particular implementation.
This logic should be transferred to a process function that collects input buffers in each call, and utilizes them to create the frames that will feed the model. Take a look here
Line 10 in 1985e3d
There's an input buffer coming in, in each call, and an output frame coming out, while the function returns true if a valid frame is produced (simply because the first buffers wont be enough in length to produce a full frame)
|
|
||
| // slice original signal to Frames | ||
| const int nFrames = 4; | ||
| FramedSignal framedSignal{ resampledSignal , nFrames, FRAME_LENGTH, HOP_SIZE }; |
There was a problem hiding this comment.
FramedSignal framedSignal should be declared on the header first. This will create an member object of the BeatNet class. Declaring it here does not make sense for many reasons.
At first, because the framedSignal object resides in the scope of the function, which means that its lifetime has automatic duration and at the end of each function call, the object gets destroyed and defined in the next call. So the concept of accumulating buffers for creating frames to feed the model, is in this way violated.
You should declare the object on the header, initialize it on the BeatNet constructor, and then call framedSignal.process() function from withing preprocess function.
| if (!valid_frame) { | ||
| // std::cout<<"invalid frame and will be invalid for the first ~"<<FRAME_LENGTH/resampled.size()-1<<" frames"<<std::endl; | ||
| return false; | ||
| } |
There was a problem hiding this comment.
You should probably keep this, to make sure that during the first calls of the function, where the first frame that is currently under formation while collecting the first buffers, will return false, aborting the inference of the model. Simply put, in such case, there is not yet a valid input signal to pass to the model.
| std::vector<float> getOriginalSignal(); | ||
| int get_nFrames(); |
There was a problem hiding this comment.
These functions are not used. Should they be removed?
| std::vector<float> resampledSignal = resampler.resample(raw_input); | ||
|
|
||
| // slice original signal to Frames | ||
| const int nFrames = 4; |
There was a problem hiding this comment.
Prefer defining hyperparameters outside functions. Especially since this is a fixed value, you can either define it using #define ( i.e. #define FRAMED_SIGNAL_NFRAMES 4) in BeatNet.h along with the rest of them, or either in the top of the framedSignal.h. I believe that would make things clearer and more maintainable in case of you need to experiment with its value in the future.
Major corrections to align c++ output data with BeatNet Python outputs
Framed Signal
Spectrum
Log Spectrogram
Spectrogram difference