Skip to content

Conversation

@marypilataki
Copy link

  • User can either provide path of audio file or an array of audio samples for prediction. Can be useful when we want to transcribe an excerpt of an audio file or when audio loading is done in some other part of our code.

Copy link

@marlonwq marlonwq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good. Suggestion: You could comment the code.

Copy link
Collaborator

@hyperc54 hyperc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!
Thanks for the PR, and sorry for the delay in replying
It looks great and would also resolve some asks made in #55

I only added a couple nits but this looks great overall, I'd also appreciate if you could add a unit-test before we merge anything.

I appreciate that given we took a long time to reply you might not be available anymore to iterate on this. I'm happy to help resolving the suggestions I made, just let me know!

def predict(
audio_path: Union[pathlib.Path, str],
audio_path_or_array: Union[pathlib.Path, str, np.ndarray],
sample_rate: int = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use Optional[int] type

melodia_trick: bool = True,
debug_file: Optional[pathlib.Path] = None,
midi_tempo: float = 120,
verbose: bool = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's a good idea, although basic pitch is already pretty verbose by default. I think it is fine in this PR to add a few logs lines without needing to control these with a new verbose parameter. We can think of controlling the verbosity in future PRs in my opinion :)

Args:
audio_path: File path for the audio to run inference on.
audio_path_or_array: File path for the audio to run inference on or array of audio samples.
sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array.
sample_rate: Mandatory if audio_path_or_array is a np array. it should represent the sample rate of the provided array. Ignored if `audio_path_or_array` is a string

nit

def run_inference(
audio_path: Union[pathlib.Path, str],
audio_path_or_array: Union[pathlib.Path, str, np.ndarray],
sample_rate: None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use Optional[int] type

Args:
audio_path: File path for the audio to run inference on.
audio_path_or_array: File path for the audio to run inference on or array of audio samples.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be great to add information on the expected input shape, and enforce it right at the beginning of the method. It looks like you're merging channels if multiple are provided, it could be worth adding a note about that in the docstring

Comment on lines +290 to +291
audio_path_or_array: The audio to run inference on. Can be either the path to an audio file or a numpy array of audio samples.
sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same docstring comments apply here too

with no_tf_warnings():
print(f"Predicting MIDI for {audio_path}...")
if isinstance(audio_path_or_array, np.ndarray) and verbose:
print("Predicting MIDI ...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("Predicting MIDI ...")
print("Predicting MIDI for input audio array of shape XX")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants