-
Notifications
You must be signed in to change notification settings - Fork 404
add option to provide audio samples for prediction #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add option to provide audio samples for prediction #153
Conversation
marypilataki
commented
Nov 20, 2024
- User can either provide path of audio file or an array of audio samples for prediction. Can be useful when we want to transcribe an excerpt of an audio file or when audio loading is done in some other part of our code.
marlonwq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems good. Suggestion: You could comment the code.
hyperc54
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Thanks for the PR, and sorry for the delay in replying
It looks great and would also resolve some asks made in #55
I only added a couple nits but this looks great overall, I'd also appreciate if you could add a unit-test before we merge anything.
I appreciate that given we took a long time to reply you might not be available anymore to iterate on this. I'm happy to help resolving the suggestions I made, just let me know!
| def predict( | ||
| audio_path: Union[pathlib.Path, str], | ||
| audio_path_or_array: Union[pathlib.Path, str, np.ndarray], | ||
| sample_rate: int = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use Optional[int] type
| melodia_trick: bool = True, | ||
| debug_file: Optional[pathlib.Path] = None, | ||
| midi_tempo: float = 120, | ||
| verbose: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It's a good idea, although basic pitch is already pretty verbose by default. I think it is fine in this PR to add a few logs lines without needing to control these with a new verbose parameter. We can think of controlling the verbosity in future PRs in my opinion :)
| Args: | ||
| audio_path: File path for the audio to run inference on. | ||
| audio_path_or_array: File path for the audio to run inference on or array of audio samples. | ||
| sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array. | |
| sample_rate: Mandatory if audio_path_or_array is a np array. it should represent the sample rate of the provided array. Ignored if `audio_path_or_array` is a string |
nit
| def run_inference( | ||
| audio_path: Union[pathlib.Path, str], | ||
| audio_path_or_array: Union[pathlib.Path, str, np.ndarray], | ||
| sample_rate: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use Optional[int] type
| Args: | ||
| audio_path: File path for the audio to run inference on. | ||
| audio_path_or_array: File path for the audio to run inference on or array of audio samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It would be great to add information on the expected input shape, and enforce it right at the beginning of the method. It looks like you're merging channels if multiple are provided, it could be worth adding a note about that in the docstring
| audio_path_or_array: The audio to run inference on. Can be either the path to an audio file or a numpy array of audio samples. | ||
| sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same docstring comments apply here too
| with no_tf_warnings(): | ||
| print(f"Predicting MIDI for {audio_path}...") | ||
| if isinstance(audio_path_or_array, np.ndarray) and verbose: | ||
| print("Predicting MIDI ...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| print("Predicting MIDI ...") | |
| print("Predicting MIDI for input audio array of shape XX") |