Monday, September 21, 2020
  • Home
  •             

Vista: Speech Recognition

March 22nd, 2006 by Jabez Gan [MVP]

Here are some FAQs that users have about Vista’s Speech Recognition.[Overview]
Q: Speech recognition has been significantly upgraded in Windows Vista; could you give me an overview about what it took since Windows XP to make this major breakthrough?
A: One major difference was our attention to users. In the past I think we thought more about this as a technology, and less about it as a useful tool for people.
 

For example, we had the ability to spell words in Office XP, but it wasn’t a complete experience. You could say “Alpha, Bravo, Charlie”, or “A B C” but sometimes you’d get “A. B. C.”. And, if one of those letters was wrong, there was no way to go back and correct the letters, other than saying “Press left arrow”, “Press left arrow” … 

By starting with users requests, and doing careful usability testing all along the way, we think we’ve done a much better job delivering what users want in a speech recognition package than we ever have before. 

Q: as a user that has not yet used speech recognition, what do you see as the main uses for speech recognition, will I be able to use the computer without keyboard and/or mouse?A: Speech recognition can be used to make your interactions with your PC more natural. Some things are best left to the mouse, like drawing in Photoshop, or interacting with some games, like Minesweeper.      

Having said that, you should still be able use those programs by voice if you choose, but if you can use a mouse and a keyboard, that might be better in some applications. 

However, if you are a slow typist, or just want to lean back in your chair, Speech Recognition could be a good solution. 

It’s also ideal for people that have mobility impairments such as RSI. 

Speech recognition can be used to make your interactions with your PC more natural. Some things are best left to the mouse, like drawing in Photoshop, or interacting with some games, like Minesweeper. Having said that, you should still be able use those programs by voice if you choose, but if you can use a mouse and a keyboard, that might be better in some applications.However, if you are a slow typist, or just want to lean back in your chair, Speech Recognition could be a good solution.It’s also ideal for people that have mobility impairments such as RSI. 

Speech recognition can be used to make your interactions with your PC more natural. Some things are best left to the mouse, like drawing in Photoshop, or interacting with some games, like Minesweeper. Having said that, you should still be able use those programs by voice if you choose, but if you can use a mouse and a keyboard, that might be better in some applications.However, if you are a slow typist, or just want to lean back in your chair, Speech Recognition could be a good solution.It’s also ideal for people that have mobility impairments such as RSI. 

Speech recognition can be used to make your interactions with your PC more natural. Some things are best left to the mouse, like drawing in Photoshop, or interacting with some games, like Minesweeper. Having said that, you should still be able use those programs by voice if you choose, but if you can use a mouse and a keyboard, that might be better in some applications.However, if you are a slow typist, or just want to lean back in your chair, Speech Recognition could be a good solution.It’s also ideal for people that have mobility impairments such as RSI. 

Q: Who is the main target for speech recog. in vista? Is there any special attention given to certain users based on the flavor of the OS ?A: Vista is shipping with features that could be enable in the future once we get a chance to provide tools to developers to support specialized languages such as medical and law. This is still in planning stage. But the OS will support it. Microsoft does not plan to provide the vocabularies needed. It would likely require ISV’s to do so. 

Vista is shipping with features that could be enable in the future once we get a chance to provide tools to developers to support specialized languages such as medical and law. This is still in planning stage. But the OS will support it. Microsoft does not plan to provide the vocabularies needed. It would likely require ISV’s to do so. 

Vista is shipping with features that could be enable in the future once we get a chance to provide tools to developers to support specialized languages such as medical and law. This is still in planning stage. But the OS will support it. Microsoft does not plan to provide the vocabularies needed. It would likely require ISV’s to do so. 

Vista is shipping with features that could be enable in the future once we get a chance to provide tools to developers to support specialized languages such as medical and law. This is still in planning stage. But the OS will support it. Microsoft does not plan to provide the vocabularies needed. It would likely require ISV’s to do so.Source: MS Private Beta Chat (permission received to post) 

Note: Questions are edited for clearity. 

Source: MS Private Beta Chat (Vista) – Permission given on blogging 

[Using Speech Recognition] Q: Speech recognition seems to require certain words to be said with an american dialect or say them in an american way, any comments on this?  

A: We train our acoustic models for American English with a large speech corpus designed to represent the range of pronunciations found in American English. We provide two features to make SR better for each individual user: 1) Speaker adaptation. The speaker-independent engine shipped with the OS rapidly adapts to an individual’s accent, and becomes a speaker-dependent engine. 2) Adding pronunciations. Users can add new words and new pronunciations to the system. We train our acoustic models for American English with a large speech corpus designed to represent the range of pronunciations found in American English. We provide two features to make SR better for each individual user: 1) Speaker adaptation. The speaker-independent engine shipped with the OS rapidly adapts to an individual’s accent, and becomes a speaker-dependent engine. 2) Adding pronunciations. Users can add new words and new pronunciations to the system.Q: Are there any plans to integrate Text-To-Speech natively in common Windows controls such as text boxes? Such feature already exists in Apple Mac OS.
A: For speech output, Microsoft has included Narrator in all versions of Windows for several years now. It will speak the name of controls as they are selected and/or focused.  

Q: Will you need to address the voice recognition feature somehow? Or in a different perspective of this, will talking nearby affect or activate the voice recognition feature? A: User is always in control. Once you have Speech Running you can say ‘stop listening’ and the Speech recognition application goes into sleep mode. It can be awaken by saying ‘Start listening’. If you have a mute button on your microphone, you can also use that to control when the speech application is listening. User is always in control. Once you have Speech Running you can say ‘stop listening’ and the Speech recognition application goes into sleep mode. It can be awaken by saying ‘Start listening’. If you have a mute button on your microphone, you can also use that to control when the speech application is listening. Q: Can you bring voice training files from XP into Vista? What about training files from Dragon?  

A: For XP to Vista upgrade, only user added words will be used for the new profile. Users will have to re-do training. Dragon training files will be transfered on XP to Vista upgrade, but they will not be used by our recognizer. If you are doing clean install than Dragon should have some kind of profile and data transfer tool/system. For XP to Vista upgrade, only user added words will be used for the new profile. Users will have to re-do training. Dragon training files will be transfered on XP to Vista upgrade, but they will not be used by our recognizer. If you are doing clean install than Dragon should have some kind of profile and data transfer tool/system.  Q: What are the languages (besides English, German and Japanese) that will be supported by Speech Recognition; and how can users ‘install’ definition files for other languages? A: To install other languages supported by Speech recognition, you will need to install MUI packs and switch OS to that language locale. Than you will be able to start Speech Recogniton for that language. To install other languages supported by Speech recognition, you will need to install MUI packs and switch OS to that language locale. Than you will be able to start Speech Recogniton for that language. [Vista Speech Recognition’s Limitation]  

Q: is there a total number of words you can add to the vocabulary? Can you add a list (an ASCII listing), of words directly to the vocabulary. A: 1000 words max, currently. We don’t have UI to add words in bulk at the moment, but it’s a good feature suggestion. An applictation could easily be written by a 3rd party to do this. 1000 words max, currently. We don’t have UI to add words in bulk at the moment, but it’s a good feature suggestion. An applictation could easily be written by a 3rd party to do this. Q: Is there a point to use speech in a single room office with many people around?  

A: Three things come to mind here: 1) The main challenge of that scenario is capturing sufficient quality audio from the multiple speakers – quality microphones and setup to provide a reasonable signal to the SR. 2) The speech recognizer adapts to each user to improve recognition accuracy, and links the adaptation information to that user’s profile. A profile for one user may or may not work as well for another user. 3) Multiple speakers on a single audio feed produce overlapping waveforms. SR is designed for a single speech stream. I can envision a future when multiple parties in a meeting could each have their own mobile device capturing their speech (speaker-adapted recognition, good quality audio input), which is then synched up with the rest of the meeting! Three things come to mind here: 1) The main challenge of that scenario is capturing sufficient quality audio from the multiple speakers – quality microphones and setup to provide a reasonable signal to the SR. 2) The speech recognizer adapts to each user to improve recognition accuracy, and links the adaptation information to that user’s profile. A profile for one user may or may not work as well for another user. 3) Multiple speakers on a single audio feed produce overlapping waveforms. SR is designed for a single speech stream. I can envision a future when multiple parties in a meeting could each have their own mobile device capturing their speech (speaker-adapted recognition, good quality audio input), which is then synched up with the rest of the meeting! [Bluetooth] Q: My understanding is bluetooth has a lower quality output than other wireless methods, so would not be suitable for voice recognition, is that correct?  

A: The Bluetooth standard currently only allows recording in 8kHz (vs. 16kHz typically used for speech recognition on the desktop). This was mainly geared towards mobile phone use scenarios. So we would expect some degradation in accuracy when using a bluetooth microphone due to the lower quality signal. The Bluetooth standard currently only allows recording in 8kHz (vs. 16kHz typically used for speech recognition on the desktop). This was mainly geared towards mobile phone use scenarios. So we would expect some degradation in accuracy when using a bluetooth microphone due to the lower quality signal.  Q: Will Bluetooth headsets be supported? A: Hi Daron, if the headset has a Bluetooth audio profile, it should work with Vista. However, currently BT headsets are limited to 8Khz range and thus the quality of the recognition decreases. This is something that we have improved on Windows Mobiles devices, but not yet with Vista. Hi Daron, if the headset has a Bluetooth audio profile, it should work with Vista. However, currently BT headsets are limited to 8Khz range and thus the quality of the recognition decreases. This is something that we have improved on Windows Mobiles devices, but not yet with Vista.    

[Application Compatibility] Q: what applications will be speech enabled? Office 2007? Outlook?  

A: In Vista most applications should be accessible via Speech. We take advantage of some of the accessibility framework of the operating system to discover menu items, etc. You should also be able to dictate into most text boxes. In Vista most applications should be accessible via Speech. We take advantage of some of the accessibility framework of the operating system to discover menu items, etc. You should also be able to dictate into most text boxes.  Q: what programs working with speech? A: Windows Speech Recognition works with most applications written for Windows. There are some exceptions. If an application exposes itself properly to the OS, we will work well with it. Windows Speech Recognition works with most applications written for Windows. There are some exceptions. If an application exposes itself properly to the OS, we will work well with it. Q: Will speech recognition work over remote desktop?  

A: Not currently.  Not currently.   Q: Is it possible to manually move the mouse, then issue a voice command like “right-click”, “double-click” without some extra context to the voice command? A: You can say “Mousegrid”. This will divide the screen into 9 squares. You can move the mouse to the center of a square by saying its number. It will then divide that square in 9 squares again and you can say another number and so on until you get where you need the mouse to be. You can actually say a string of numbers at once if you know already what the sequence is to get you to a particular spot. You can say “Mousegrid”. This will divide the screen into 9 squares. You can move the mouse to the center of a square by saying its number. It will then divide that square in 9 squares again and you can say another number and so on until you get where you need the mouse to be. You can actually say a string of numbers at once if you know already what the sequence is to get you to a particular spot. [Accessibility]  

Q: Is this something that might be usefull for the blind? Will there be books read by the home computer and computer commands available with voice commands? A: Speech recognition is useful for people who need alternative means of input. Some people who are blind also have those needs. Rather, the other tools that Microsoft provides are more suited for visual impairments, such as Narrator and Magnifier. To be aware those, that these tools are *very basic* and are meant to bootstrap users into the platform. There are many companies devoted to providing a much better experience for people who have low vision or are totally blind. See http://microsoft.com/enable/ for more information. Speech recognition is useful for people who need alternative means of input. Some people who are blind also have those needs. Rather, the other tools that Microsoft provides are more suited for visual impairments, such as Narrator and Magnifier. To be aware those, that these tools are *very basic* and are meant to bootstrap users into the platform. There are many companies devoted to providing a much better experience for people who have low vision or are totally blind. See / for more information. [Additional reading]  

Q: Is there a guide to use for trying speech recognition, perhaps a URL, to get us staarted? A: Try this:
http://www.microsoft.com/technet/windowsvista/library/c208e792-e591-455a-82d9-a98264324e0d.mspx and this:
http://blogs.technet.com/chenley/archive/2006/02/21/420136.aspx 

Posted in MS News | 1 Comment »


This entry was posted on Wednesday, March 22nd, 2006 at 2:11 pm and is filed under MS News. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.


One Response

  1. MSBLOG Says:

    […] Vista is shipping with features that could be enable in the future once we get a chance to provide tools to developers to support specialized languages such as medical and law. This is still in planning stage. But the OS will support it. Microsoft does not plan to provide the vocabularies needed. It would likely require ISV’s to do so.Source: MS Private Beta Chat (permission received to post)Read More: http://msblog.resdev.net/?page_id=421 […]

Vista’s Speech Recognition. FAQs

March 22nd, 2006 by Jabez Gan [MVP]

Here are some FAQs that users have about Vista’s Speech Recognition.

[Overview]
Q:
Speech recognition has been significantly upgraded in Windows Vista; could you give me an overview about what it took since Windows XP to make this major breakthrough?
A: One major difference was our attention to users. In the past I think we thought more about this as a technology, and less about it as a useful tool for people.For example, we had the ability to spell words in Office XP, but it wasn’t a complete experience. You could say “Alpha, Bravo, Charlie”, or “A B C” but sometimes you’d get “A. B. C.”. And, if one of those letters was wrong, there was no way to go back and correct the letters, other than saying “Press left arrow”, “Press left arrow” …

By starting with users requests, and doing careful usability testing all along the way, we think we’ve done a much better job delivering what users want in a speech recognition package than we ever have before.

Note: Questions are edited for clearity. 

Source: MS Private Beta Chat (Vista) – Permission given on blogging

Posted in Windows Vista | Comments Off on Vista’s Speech Recognition. FAQs


This entry was posted on Wednesday, March 22nd, 2006 at 1:58 pm and is filed under Windows Vista. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.


Leave a Comment

Logged in as . Logout »


*Fields in bold are required. Email addresses are never published or distributed.

*Some HTML code is allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
URIs must be fully qualified (eg: http://www.domainname.com) and all tags must be properly closed.


*Line breaks and paragraphs are automatically converted.


*Please keep comments relevant. Off-topic, offensive or inappropriate comments may be edited or removed.