Talk:Speech Recognition: Difference between revisions

Latest revision as of 03:55, 5 August 2011

About the People

Seems strange that although there is an External Link to the interview with Kurzweil, there is no mention of him in the article.

While we are talking about people, surely Jo Lernout and Pol Hauspie should rate a mention? --Graham Proud 09:55, 5 August 2011 (UTC)

Consistency of terms

Right now, we have articles dealing with "spoken language" rather than "speech", at least definitions of musical-quality and telephone-quality voice, etc. Shall we try to get some consistency before too much is embedded?

I wonder if this would better be titled "voice recognition", both to be consistent with VoIP, and also for using individual voice for biometric identification rather than communications? Howard C. Berkowitz 15:17, 25 July 2008 (CDT)

This subject is a new area for me. In the research that I have done so far, the professionals divide it up as "speech recognition" for what I am talking about, and "speaker recognition" for biometric identification. No expert appears to use "voice recognition", though that is the term that I was originally going to use. Perhaps it is a Brontosaurus/Apatosaurus sort of divide.

Just to add to the confusion, AI people talk about natural language processing, which appears to be the equivalent of what I am calling "computer speech technology"; that is, speech recognition plus responding to the recognized speech; and also including speech synthesis.

As for "speech" versus "spoken language", I cannot thing of a distinction between the two, so I would prefer the shorter term. I do see a distinction between speech and voice, where I would consider voice to be a broader term encompassing all the sounds that a human could generate--Donna Summer singing "I Feel Love" is voice, but not really speech.

Samuel C. Smith 14:37, 26 July 2008 (CDT)

On doing some reference checking, the situation appears to be very confused. You're right that the specific person identification term is "speaker". On checking some recent literature on products that do "mouth noise recognition" for computer input, DragonDictate and NaturallySpeaking (http://www.ddwin.com/overview.htm) seem to use speech rather than voice. You seem to be right that "speech" is the current term in computer-based recognition, since the last document I have on VoiceNavigator, the maker's website being down, calls it "speech recognition".

On the other hand, the standard term for sending "mouth noise" over an IP-based telephone system is definitely "voice over IP". The International Telecommunications Union appears still to use "voice" as the source of that which Mean Opinion Score is calculated: http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html#mos.

With respect to Donna Summer, voice is definitely the musical term. In analog telephony, we'd speak of "toll quality" voice as a 4 kHz bandwidth channel, where a dedicated line for a FM radio station to pick up a concert for live broadcast has 16 kHz of bandwidth.

Perhaps we need a speech-vs.-voice article. I've already put up a disambiguation page for voice as in VoIP as opposed the musical context.Howard C. Berkowitz 18:56, 26 July 2008 (CDT)

Howard, Good thought about the confusion of terms. I have put redirects to this article at Voice Recognition and Voice recognition. There is already a redirect from Speech recognition.Pat Palmer 11:00, 29 August 2008 (CDT)

Linguisticky edits

I've made these edits. The most notable changes are probably to the introduction, rewriting it to clarify that articulation isn't just from the vocal cords, and that child language acquisition and speech recognition is still a controversial subject (e.g. speech is variable - everyone hears different things - but all kids seem to zoom in on the right abstract rules and phonological patterns). There is also some clarification between written and spoken language in the second section (five or six vowels = in writing, 20+ = in speech). I also changed the last bit there to point out that discourse particles are meaningful, but not in the way that regular words are. Finally, I mentioned generative phonology in the third section. John Stephenson 08:20, 29 October 2008 (UTC)

@@ Line 2: / Line 2: @@
 This article was started in July, 2008, as part of the Eduzendium Project.  We will complete the article by August 14, 2008.
+== About the People ==
+Seems strange that although there is an External Link to the interview with Kurzweil, there is no mention of him in the article.
+While we are talking about people, surely Jo Lernout and Pol Hauspie should rate a mention?
+--[[User:Graham Proud|Graham Proud]] 09:55, 5 August 2011 (UTC)
+== Consistency of terms ==
+Right now, we have articles dealing with "spoken language" rather than "speech", at least definitions of musical-quality and telephone-quality voice, etc. Shall we try to get some consistency before too much is embedded?
+I wonder if this would better be titled "voice recognition", both to be consistent with VoIP, and also for using individual voice for biometric identification rather than communications?  [[User:Howard C. Berkowitz|Howard C. Berkowitz]] 15:17, 25 July 2008 (CDT)
+:This subject is a new area for me.  In the research that I have done so far, the professionals divide it up as "speech recognition" for what I am talking about, and "speaker recognition" for biometric identification.  No expert appears to use "voice recognition", though that is the term that I was originally going to use.  Perhaps it is a Brontosaurus/Apatosaurus sort of divide.
+:Just to add to the confusion, AI people talk about natural language processing, which appears to be the equivalent of what I am calling "computer speech technology"; that is, speech recognition plus responding to the recognized speech; and also including speech synthesis.
+:As for "speech" versus "spoken language", I cannot thing of a distinction between the two, so I would prefer the shorter term.  I do see a distinction between speech and voice, where I would consider voice to be a broader term encompassing all the sounds that a human could generate--Donna Summer singing "I Feel Love" is voice, but not really speech.
+[[User:Samuel C. Smith|Samuel C. Smith]] 14:37, 26 July 2008 (CDT)
+::On doing some reference checking, the situation appears to be very confused. You're right that the specific person identification term is "speaker".  On checking some recent literature on products that do "mouth noise recognition" for computer input, DragonDictate and NaturallySpeaking (http://www.ddwin.com/overview.htm) seem to use speech rather than voice. You seem to be right that "speech" is the current term in computer-based recognition, since the last document I have on VoiceNavigator, the maker's website being down, calls it "speech recognition".
+::On the other hand, the standard term for sending "mouth noise" over an IP-based telephone system is definitely "voice over IP". The International Telecommunications Union appears still to use "voice" as the source of that which Mean Opinion Score is calculated: http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html#mos.
+::With respect to Donna Summer, voice is definitely the musical term. In analog telephony, we'd speak of "toll quality" voice as a 4 kHz bandwidth channel, where a dedicated line for a FM radio station to pick up a concert for live broadcast has 16 kHz of bandwidth.
+::Perhaps we need a speech-vs.-voice article.  I've already put up a disambiguation page for voice as in VoIP as opposed the musical context.[[User:Howard C. Berkowitz|Howard C. Berkowitz]] 18:56, 26 July 2008 (CDT)
+:::Howard, Good thought about the confusion of terms.  I have put redirects to this article at [[Voice Recognition]] and [[Voice recognition]].  There is already a redirect from [[Speech recognition]].[[User:Pat Palmer|Pat Palmer]] 11:00, 29 August 2008 (CDT)
+==Linguisticky edits==
+I've made [http://en.citizendium.org/wiki?title=Speech_Recognition&diff=100403294&oldid=100387364 these edits]. The most notable changes are probably to the introduction, rewriting it to clarify that articulation isn't just from the vocal cords, and that child language acquisition and speech recognition is still a controversial subject (e.g. speech is variable - everyone hears different things - but all kids seem to zoom in on the right abstract rules and phonological patterns). There is also some clarification between written and spoken language in the second section (five or six vowels = in writing, 20+ = in speech). I also changed the last bit there to point out that discourse particles are meaningful, but not in the way that regular words are. Finally, I mentioned generative phonology in the third section. [[User:John Stephenson|John Stephenson]] 08:20, 29 October 2008 (UTC)

Talk:Speech Recognition: Difference between revisions

Latest revision as of 03:55, 5 August 2011

About the People

Consistency of terms

Linguisticky edits

Navigation menu

Search