Notation Software Users Forum - View Single Post

Sherry C · #7 04-26-2016, 03:43 PM

Hi,

Speech recognition is tougher than a few notes, but that's not what we're talking about here

I used to work in hearing research (Kresge Hearing Research Institute at the University of Michigan in Ann Arbor, MI, USA), and research on speech recognition has been going on for decades, with a lot of funding from the government

Given what I know of it as well as music, I would have to disagree as to a difference in complexity. Both are quite complex analyses - pretty much 4D. With both you have pitch (essential in some languages and definitely essential to music), duration, loudness, and timbre (formants in speech; timbre is essential to distinguishing between instruments and even different pitches from the same instrument).

Speech recognition is a much more universal application, and also more lucrative in terms of financial reward. Ergo a lot more time and effort have been spent there. Not so much (unfortunately) with musical analysis and transcription. You'll also notice that speech recognition technology typically requires some 'training' period for the software, as well as clearly spoken words by a single person. This is analogous to a single-note melody line for a clearly-played (eg. no vibrato, no reverb, etc.) instrument. Try singing to a speech recognition program some time and add vibrato and fermatas and other musical embellishments - the results will not be the same as with regular speech

As I mentioned previously, the Melodyne Editor actually does a phenomenal job of separating audio signals, but still does not put out a satisfactory (yet) MIDI for notation purposes unless it is a fairly uncomplicated piece (ie. a single-instrument with a "clean" recording.) It doesn't separate out the audio analysis on a per-instrument basis for multi-instrument pieces - that's a job that still requires a human to discern

However, it is heartening that folks are still working on the problem.

I hope this helps explain part of the complexity and comparison of the technologies. Maybe it was more than you wanted to know

(my kids tell me I do that often

).

ttfn,
Sherry