the percentage of correct should be calculated on total audio duration. However, now we the silence tokens omitted from the output (to have same number of tokens as in annoation). Therefore silence at beginning and end is not in list of detected tokens.
As a workaround the total duration considered in the one from firrst annotaion token and last annotated token.
the percentage of correct should be calculated on total audio duration. However, now we the silence tokens omitted from the output (to have same number of tokens as in annoation). Therefore silence at beginning and end is not in list of detected tokens.
As a workaround the total duration considered in the one from firrst annotaion token and last annotated token.