Audio to midi (score) Forum Home | Log In or Register | Forum Help
Last 1 | 3 | 7 | 14 | 30 Days | Search | Tree View

Notation Software Users Forum » (V2) Requests for New Features » Audio to midi (score) « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Clyde (clyde)
Senior Forum User
Username: clyde

Post Number: 698
Registered: 12-2002
Posted on Sunday, May 22, 2005 - 8:56 pm:   Edit PostDelete PostPrint Post   Move Post (Moderator/Admin Only)

Hi Sherry,

I've taken the liberty to start a new thread on this, so we don't clog up Mark's 'new features' section.

Thanks for the explanation of how our hearing works - it was most helpful. You seem to be confirming what I thought - that the steps to analysing sound are far more complex than just looking for sine waves (as we do with current mathematical techniques).

I think there is a lot to be gained by trying to understand what the output is from the mechanical part of the ear (I think they call that biometrics). It is that output that contains the detail (in some form) of the sounds that are currently being heard. Later stages of the process add experience and interpretation (based on experience) to the message set.

Can you point me to some web pages that may give me more details about this first stage (not real basic stuff), and the messages that are sent out from the mechanical section of the ear?

with thanks... Clyde
Top of pagePrevious messageNext messageBottom of page Link to this message

Mark Walsen (markwa)
Notation Software Developer
Username: markwa

Post Number: 1648
Registered: 7-2003
Posted on Monday, May 23, 2005 - 12:33 am:   Edit PostDelete PostPrint Post   Move Post (Moderator/Admin Only)

Hi Clyde,

Off the top of my head, I don't know what web sites specialize in the problem of audio transcription.

I do remember reading about 20 years ago part of a book named "Sound Colors". It relates to this subject pretty well. One of the key ideas in the book is that the human ear particularly tuned itself well to recognizing the subtleties of sound produced by the human voice. The book analyzed various sounds in language, such as vowels and consonants in English, and related those sounds to similar sounds produced by musical instruments, especially woodwind and brass instruments that move air through a pipe just as a human voice moves air from a pipe-like chamber.

This relationship of human perception of instrument sounds to language made a lot of sense to me. I had studied harmony, which had a mathematical appeal to me. I studied orchestration, which seemed like cookbook recipes to me, rather than science. But this "Sound Colors" book offered scientific insight into orchestration into orchestration that I found satisfying. Well, if it was scientific, it was as least a sweet evolutation And now that I think about it, the ideas in this "Sound Colors" book might be quite useful for software engineers working on audio transcription. Solving the problem of music audio transcription might need to borrow a lot from speech recognition. To think of the problem of music audio transcription as just a pitch recognition problem is probably a gross over-simplification of the problem.

I spent a few minutes looking for the book on the web. I think the "Sound Color" book was probably the one with that title written by Wayne Slawson. The publication date was 1985, which is consistent with my memory that I read the book about 20 years ago; and it was a fresh book then. I might have to hunt for a copy of this book; and also find how what else this author has worked on in the last 20 years.

Cheers
-- Mark
Top of pagePrevious messageNext messageBottom of page Link to this message

Clyde (clyde)
Senior Forum User
Username: clyde

Post Number: 699
Registered: 12-2002
Posted on Monday, May 23, 2005 - 1:26 am:   Edit PostDelete PostPrint Post   Move Post (Moderator/Admin Only)

Hi Mark,
Thanks for the lead about the concept of the book on 'Sound Colours".

I totally agree with your statement that "To think of the problem of music audio transcription as just a pitch recognition problem is probably a gross over-simplification of the problem." The proof of this is by the failure of the FFT approach with music. There are many great mathematical minds working on this, and they have not solved the problem using traditional maths.

I feel the problem needs a fresh approach - and while I don't class myself in anyway qualified to do this, I take heart that often ordinary people have contributed significantly to the solving life's problems.

I think we need to think about sound in different terms to the traditional pitch, amplitude, phase etc. I'm reminded about how cats see - they don't necessarily process all the colours and understand every item in the picture - but they notice things that move in the picture, and they focus on the movement. There maybe some other feature like 'sound colour' that is the key to hearing.

Anyway, its a challenging intellectual exercise, and one that has lots of applications to many areas of life.

I'll go and look for 'sound colours' (or should I say, 'sound colors').

Cheers ... Clyde
Top of pagePrevious messageNext messageBottom of page Link to this message

Sherry Crann (sherry)
Senior Forum User
Username: sherry

Post Number: 740
Registered: 1-2004
Posted on Monday, May 23, 2005 - 12:22 pm:   Edit PostDelete PostPrint Post   Move Post (Moderator/Admin Only)

Howdy Clyde,

I'll have to take a look for websites regarding processing. Most of my info has come from books and journal papers, but I'm sure that much of that is on the web now.

ttfn,
Sherry
Top of pagePrevious messageNext messageBottom of page Link to this message

Clyde (clyde)
Senior Forum User
Username: clyde

Post Number: 700
Registered: 12-2002
Posted on Monday, May 23, 2005 - 8:22 pm:   Edit PostDelete PostPrint Post   Move Post (Moderator/Admin Only)

Hi Sherry,
Thanks for the detailed explanation of your work in the earlier discussion at: http://www.notation.com/cgi-bin/discus/show.cgi?26209/26210.

Recognition of instruments, sound color and certain basic sounds in speech is something our brains do well. But what does the mechanical part of the hearing system provide to the brain to make this determination?

Sure enough the actual method is complicated message in the nervous system - but what information does those messages contain? For example:

(a) Does the ear determine the pitch, amplitude and phase of piece of the sound?
(b) If it determines 'pitch' (or frequency), frequency of what sort of wave. Is it of a sine wave, square wave etc?
(c) does it measure something else out of the sound wave?

I can't help but think that our obession with sine waves is causing us to 'bark up the wrong tree'. Sine ways certainly fit the maths of FFT, and many industrial applications of noise detectection are looking for repetitious sine waves.

But is that what our ears do? I know I keep saying this, but I feel there must be another way at looking at the wave format that will give us access to new information about the underlying sounds.

Cheers ... Clyde
Top of pagePrevious messageNext messageBottom of page Link to this message

Mark Walsen (markwa)
Notation Software Developer
Username: markwa

Post Number: 1651
Registered: 7-2003
Posted on Tuesday, May 24, 2005 - 12:00 am:   Edit PostDelete PostPrint Post   Move Post (Moderator/Admin Only)

Sherry and Clyde,

Sherry, you mentioned in the other thread:

quote:

sensory cells called hair cells (these are the cells I was particularly interested in during my research career).


I think this might an area that Clyde will be particularly interested in. If I remember my high-school biology correctly, don't each of these hairs tune into an approximate frequency? If so, that these "hair cells" are actually performing the function of Fourier analysis-- converting waves into frequency spectrums. If you're not familiar with Fourier analysis, I'm sure Clyde will be glad to give you an overview.

Clyde, if you're searching for an analogy between the mechanics of hearing and software-based audio transcription, then this might actually bring you back closer to, not further away, from frequency analysis.

However, one of the nastiest things about FFT is that one needs several repetitions of a wave to determine with much confidence what the frequencies are that add up to make the wave. But, unfortunately, the frequencies/harmonics are constantly in flux over the duration of the note. Somehow, the software needs to do some fancy pattern recognition-- not just static frequency analysis, but analysis of dynamically changing frequency patterns.

At first glance, it might seem that attempting to identify the different pitches of simultaneously performing instruments might be much more difficult, and less successful, if a dynamic analysis approach is taken. But I suspect there is a chance that dynamic analysis is what will make the task easier and more successful. I suspect our auditory sysmte is, in effect, doing such dynamic analysis of changing frequencies/harmonics, and that this actually makes the job easier, not more difficult, for our auditory system. Perhaps the essence of our auditory system is not that it is good at exactly sorting out sine waves as FFT does but, rather, that our auditory system is good at recognizing constantly moving patterns of pitches that have "signatures", such as the voice of one's child off at a distance in a croud of 100s of people.

Said another way, maybe the audio transcription software should not be trying to decompose complex waves into sine waves but, rather decompose complex waves into remembered patterns of wave sequences.

Perhaps a good way to get started on this approach would be to generate some polyphonic music using recorded sound samples, such as those of GPO. Then see if you can "reverse-engineer" what sound samples were used, and added together, to create the polyphonic sound segment. If someone gave me a 10 million dollar budget to tackle this problem, I might start with this approach and hire some digital signal processing (DSP) engineers. But first, I'd read a lot about what other attempts have been made to try solve audio transcription, which is what you (Clyde) have been smartly doing.

Cheers
-- Mark
Top of pagePrevious messageNext messageBottom of page Link to this message

Sherry Crann (sherry)
Senior Forum User
Username: sherry

Post Number: 741
Registered: 1-2004
Posted on Tuesday, May 24, 2005 - 12:21 am:   Edit PostDelete PostPrint Post   Move Post (Moderator/Admin Only)

Howdy,

Yes, Mark, OHCs (outer hair cells - they're not "hairs", but they do have hair-like protrusions on the apical end) are roughly "pitch sensitive" - it's actually rather fun to watch a microvideo of a portion of the OHCs responding to a pure tone! But as I also mentioned, and you surmised/reiterated, there is a definite dynamic component to their stimulation/transduction. So it's not as easy (and I don't use that term lightly :-) ) as "simple" Fourier analysis.

You can check out http://www.nlm.nih.gov/ and check out various areas of interest using the search engine. It's not as extensive as MedLine, but you have to have a subscription to ML, and since I don't "consult" anymore, we have to do the best we can :-)

Your idea of dynamic pattern analysis is a good one, and it's an idea that was being investigated when I "retired". I haven't really kept up with that area of research, but it could be fun to see what's been happening lately, and how that may be applicable to the audio to midi aspect that's obviously of interest here.

It may be helpful also, if you're interested, to find out what has been going on with cochlear implant research, as that type of analysis is of interest to them, and would probably be "linked" research. In essence, what they're trying to do is quite a similar problem - take an audio waveform and figure out how to decode/process it and then send an intelligible signal to the brain. But in Clyde's case, the 'brain' is the computer.

ttfn,
Sherry
Top of pagePrevious messageNext messageBottom of page Link to this message

Clyde (clyde)
Senior Forum User
Username: clyde

Post Number: 701
Registered: 12-2002
Posted on Tuesday, May 24, 2005 - 12:44 am:   Edit PostDelete PostPrint Post   Move Post (Moderator/Admin Only)

Hi Sherry & Mark,

I'm fascinated and throughly enjoying this discussion on hearing and its application to music. Its great to have someone who understands the ear, as I think our best lead in understanding this problem will be in there somewhere.

This idea of the 'hairs' being tuned to a frequency (or that is how I perceive it - maybe wrongly) I think is significant. We have the same thing in most musical instruments, particularly the sustain pedal on a piano that allows other strings to vibrate in line with the other notes.

If we explore that further, the question then comes, how do I construct in a computer program a 'tune circuit'. An having gotten one, all I then need is to have a 'tune circuit' for each musical pitch, and the system should (?) work.

Obtaining a tuned circuit mathematical model may not be that difficult, as tuned circuits are used in lots of things, particularly radio. However, they tend to be at high frequencies (around 455k), whereas in music we deal with much lower frequencies. To increase the sensitivity in radio tuning, heterodyning is used.

This heterodying process is where you add a known frequecy to the incoming signal, and a new frequency is produced which is the difference between the input and output. You then need have only one very sensitive tune circuit for that difference frequency.

To put it in musical terms, if I have a signal of note A at 440cps and add a signal of frequency of 445 to it, I will get a pulsing signal of 5 cps.

(Incidentally, I think this is how piano tuners work in getting a 5th - they count the beats due to the heterodyne effect).

I actually wrote a program to do the above, but got stuck on identifying the 'intermediate' frequency (to use a radio term). I could visually see the pulse on the wave graphs, but could not accurately identify them mathematically (ie, I didn't have a tuned mathematical circuit calculation).

Perhaps one of our forum members has expertise in this tuned circuit area.

Cheers ... Clyde

Add Your Message Here
Post:
Username: Posting Information:
This is a public posting area. Enter your username and password if you have an account. Otherwise, enter your full name as your username and leave the password blank. Your e-mail address is optional.
Password:
E-mail:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: NOTE: After you preview your message, click the Post Message button, which will be offered under your previewed message.