Talking Computers Microsoft’s Long-Range Plan to Thwart the Feds

Hal Plotkin, Special to SF Gate
Wednesday, June 7, 2000

URL: http://www.sfgate.com/cgi-bin/article.cgi?file=/technology/archive/2000/06/07/voicerec.dtl

You’ve probably seen the ads yourself.

A comfortably dressed, relaxed looking Bill Gates is talking directly into the camera. Blithely, he shrugs off the charges against his company, promising that Microsoft’s “best days are yet to come.”

The first time I saw it, I wondered — what could the man possibly be thinking?

His company is about to be torn asunder by the federales, rivals are attacking him from every quarter, and no one can even say for sure what business or businesses he may or may not be running just a few months from now.

Someone even hit him in the face with a pie not long ago.

I thought maybe Gates was in danger of becoming certifiably delusional.

The alternative, which on reflection seems much more likely, is that he knows something we don’t.

I’m betting it has to do with speech recognition.

Although most pundits haven’t been paying much attention, Bill Gates has been talking about speech recognition in nearly every recent public appearance.

It’s an area Microsoft is focusing on with ferocious intensity. And for very good reason.

The question of whether Microsoft will continue to dominate the market for computer operating systems, and by extension the entire computer industry, won’t be resolved by either the Justice Department or the federal courts regardless of the outcome of the current well-publicized anti-trust case.

Instead, it will be resolved in high-tech research and development labs that are focused on developing next-generation speech recognition software technologies.

Microsoft’s hammerlock on the computer industry will be extended for at least another generation or so if the company succeeds with its ambitious plan to build a voice-activated operating system that works with its legacy applications.

It’s possible the technology could be ready within a few years; conveniently, right around the time Microsoft is expected to exhaust the appeals process in the current anti-trust case.

The company is pushing the voice recognition envelope hard. In addition to its own efforts the company has also invested heavily in other firms working on the opportunity, including Belgium-based Lernout & Hauspie, which many acknowledge to be leading the technical charge.

Meanwhile, Microsoft is also quietly scooping up many of the best thinkers in the field and has also released a demo version of the speech activated OS technology now in development.

Gates is nothing if not an excellent chess player, always thinking five or six moves ahead.

So let’s play the game along with him.

A speech-driven Windows operating system would not only preserve Microsoft’s monopoly over desktop operating environments into the foreseeable future, it would also erase the viability of Linux as a competitive operating system much the way graphical user interfaces such as the Mac OS and Windows eclipsed previous, more cumbersome text-based operating systems such as DOS and CP/M.

After all, who wants to type instructions into a computer if you can get the same jobs done just by talking?

Anyone who saw Stanley Kubrick’s classic film, “2001: A Space Odyssey,” or even just one episode of “Star Trek,” knows that’s how computers were always supposed to work.

Several companies, including Microsoft, are already offering early voice-activated software programs. Although fun to play with, most of them still leave a lot to be desired.

I know I felt a little like astronaut Dave of “2001” fame when I tried out the speech recognition program that Microsoft first bundled with some versions of its Windows 95 operating system.

True, it was remarkable, even a bit thrilling, to see my computer quickly follow the simple verbal instructions it was preprogrammed to handle, such as switching between active windows.

Unfortunately, though, for some reason Microsoft’s software kept mistaking the sound of passing diesel engines as a signal we were finished for the day. I finally disabled it after getting tired of stopping my computer from trying to take too many unauthorized siestas.

Progress has been made since then, but the really big improvements are still on the horizon.

Ever faster, more powerful hardware now becoming more widely available will better handle the incredibly complex data crunching required to make speech recognition software work better.

Speech recognition software programs now in development will have increased power to adapt to individual voices and pronunciations, keeping a record of each time the software has been corrected by an individual user and then drawing on those corrections to avoid future mistakes.

The experts say speech recognition programs will also require a lot of work on the part of individual users before error rates are reduced sufficiently.

Judging from the direction of current approaches, you’ll probably need to teach the software how to accurately interpret your individual speaking style and diction, helping it learn to understand your accent, cadence, and any speech impediments or other verbal peculiarities.

The good news, though, is that once a user has developed and stored a set of personal linguistic reference patterns, it should be possible to transfer those same patterns to more powerful versions of the software as they become available.

So you’ll probably have to go through that time-consuming drill just once.

All this is very good news for Microsoft.

The natural home for speech recognition technology is in the computer operating system, not in individual software applications.

It makes no sense at all to develop and support different speech recognition technologies for each individual software application.

Individual software application vendors simply can’t afford to do it even if they tried.

It would be like forcing each automaker to maintain their own network of roads and bridges for the exclusive use of customers who buy their cars.

A voice-activated operating system, on the other hand, would be able to translate verbal commands into the text and graphical actions that current and future versions of software applications can understand.

If the operating system understands what you want to do and is able to pass that information along to your applications, the apps themselves won’t need to be voice-activated. They’ll be taking their commands directly from the OS, which will already speak their language.

The only thing that will matter is who makes the best voice-activated operating system that integrates seamlessly with the most popular software applications.

Which brings us back to Microsoft.

Microsoft currently owns 94 percent of the market for word processors and spreadsheet applications and more than 80 percent of the market for personal database applications.

If upheld, the federal government’s proposal to break up Microsoft would require the company to make known certain features about the application programming interfaces (APIs) of its Windows operating system so that other companies can write programs that work as well as Microsoft’s own applications.

But the feds have not proposed forcing Microsoft to disclose all the technical details about each of its applications, including any APIs they have that could be used by a voice-activated operating system.

The only way to get around the problem would be to force Microsoft to reveal every single detail about the inner workings of all their software applications as well as their OS.

But it’s unlikely that will happen since such an order would leave little if anything left of the company’s trade secrets. Given the difficultly the feds are likely to have keeping their current breakup plan for Microsoft intact during the appeals process, the likelihood of an even more extreme solution emerging seems very remote at this stage.

That’s why it doesn’t matter if the feds bust Microsoft up into two or three companies, or even turn it into a taco stand. One of those entities will eventually develop a voice-activated OS.

It could be Microsoft’s OS entity. Or it could even be Microsoft’s applications entity, which could just as easily bundle the technology into its Web browser, turning Internet Explorer into a de facto operating system that is voice-activated.

If either approach works, users will flock to it in droves.

All of this has not escaped the attention of the open source community, which brought Linux to the fore. Unfortunately, the Linux types are, for the most part, much more focused these days on other less technically advanced areas, such as trying to port current text and graphics based applications to Linux, developing substitutes for those applications, or chasing the holy grail of open source enabled online commerce.

There’s a small group working on turning Linux into a voice-activated OS but, by their own admission, traffic on their site remains insignificant.

For years, the accurate rap against Microsoft was that the company was more thief and bully than innovator. The government’s entire case against the company revolves around Microsoft’s practice of using the leverage of its OS to dominate other markets.

Microsoft is all but certain to use voice recognition technology to do the same thing, either by developing it more fully in-house or by finding it someplace else.

And that means virtually everything going on in Washington right now vis-à-vis the Microsoft case might in the end turn out to be just a sideshow.

The real action is in the lab.

CC BY 4.0
This work is licensed under a Creative Commons Attribution 4.0 International License.