Sunday, September 7, 2008

The value of predictions

Many organizations and individuals make predictions about all sorts of things. Some of these turn out to be on target and many do not. We have become quite good at predicting things we can make good mathematical models for such as the weather. By quite good, I mean that we can forecast a few days into the future. When it comes to forecasting the outcomes of complex systems (which covers nearly everything else we care about) then the results are typically poor. We tend to hear about the successful predictions and not the ones that are wrong so our perspective is focused on the fact that some people do get it right. Unfortunately we don’t know who those people are and we also don’t know which predictions they will get right and which they will get wrong. Most forecasters get lucky once in a while but the majority of them do not so it's hard to pick someone to follow with any confidence.
The bottom line is that when we make a prediction we are making a guess based on whatever facts we have at hand, which models we are using, our personal intuition and the exact distribution of some tea leaves in the bottom of a cup. The scientific and engineering professions have made some fantastic leaps over the last century or so but few of the significant ones were predicted. We just don’t know what we'll be able to do in five or ten years. Only time will tell.

Outcomes from a neurological interface

If we develop a good neurological interface that can be controlled by a conscious mind in a reliable and precise way and that can also be used to provide input from a computer, there are many different applications for this technology. Some of these are trivial, some complex. Some are of obvious benefit to mankind while others are perhaps not.

Here are some to consider:

Technology requirements for human-computer vocal interactions



From a trivial point of view, the technology increases with the complexity of the task, but at some point it becomes perhaps more of a conceptual battle than a mechanical one. Computer power is increasing at a steady rate and we have applications like 3D computer games that are ready and waiting to consume that power. But some problems are not computationally hard, but rather they are limited by our understanding of the physics, chemistry, physiology and so on involved in the problem. For some years, handwriting recognition made no progress at all then around 1990 a different approach started to appear based more on a mathematical analysis of the overall shapes than on any kind of line following or simple image matching algorithm. Once that leap had been taken, the quality of handwriting recognition improved very quickly, so we might expect to see this effect with speech recognition and the AI required for the vocal conversations we want to have with computer systems. In the technology vs. time plot in Figure 2 I'm suggesting that the timeline is very wide for when we might get good results but the technology span is flatter. In other words, we will have the computing power way before we know how to really solve the problem.

Human-computer vocal interaction timeline


We already have speech synthesis that's pretty good so I'd expect it to be perfected in the next two to five years. Speech recognition is getting better but it's still not very reliable even in controlled environments but it seems reasonable given the progress so far that this might also be perfected in the next five to ten years.


The issues of when we will be able to hold a reasonable conversation with a computer has been debated for a long time and nothing so far is even close to passing a Turing test. But there are applications where we don’t need an abstract conversation, we juts need to tell a car where we want to go or find a good place to have dinner. Once we have good speech recognition, many of these simpler interactive applications become viable very quickly. So I'm going to say that in the next 10 to 20 years we'll have the ability to interact with a computer using voice in a way that is useful and reliable.

Tuesday, September 2, 2008

Beyond Speech

I am fairly confident that we will see further advances in speech recognition and synthesis to the point certainly where a computer speaking is indistinguishable from a human and hopefully to where a computer can understand what any of us is saying. I don’t mean understand in the sense of comprehension but understanding in the sense of being able to transcribe spoken words into text. I am less confident about the development of Ai to the point where we can hold a conversation with a computer and much less confident now about a computer passing the Turing Test any time soon. But for the purpose of discussion, let's say that we might reasonably expect to be able to dictate a letter to a computer in the next five years with the computer having the ability to ask for clarification of anything it could not transcribe with a high probability of being correct. In other words, we've reached the point where we can all have a personal secretary.

A number of questions arise at this point. The two I'm interested in here are: would we like that and make use of it? And: would we want to go further and develop a thought interface?

There are obvious applications of this sort of technology for the handicapped. I'm fairly sure that Stephen Hawking would be delighted to try out such a system since it has the potential to vastly improve the speed at which he can communicate. But is that right? Would Hawking want to communicate faster than he does now? Perhaps he has adapted to his disability and the time it takes to dictate a letter using his present system allows him time to think and form his thoughts more clearly. My guess is that the majority of people with any kind of speaking disability would like very much to have a machine that could vocalize their thoughts.

For those of us who tend to blurt out the first thing that comes into our heads, it might not be so wonderful to have your thoughts instantly translated into text. "Wow, nice [body-part]"! for example, might be better left as an internal comment rather than one voiced loudly in public. Imagine sitting in a meeting and inadvertently voicing your opinion that the CEO is an idiot. This goes far beyond our inability to control body language - we're going to come right out and say what we're thinking. Personally, this sounds great and I'd like to forcibly apply it to people such as salesmen. Knowing what they are thinking as they try to sell me a car would definitely help my side of the bargaining process.

So let's add a button. You push the button when you want your thoughts translated and the computer picks them up over WiFi (or perhaps Bluetooth if we still have that in five years). That takes care of the gross blunders but does this provide a better interface? For writing a letter, I doubt that it does unless it can question my grammar and make helpful (but not annoying) suggestions for improvements. I'd like it to take dictation and then act as an interactive editor. That sounds pretty good to me. But this is all fairly pedestrian. We are getting pretty close now to being able to dictate with good accuracy and although grammar checkers still need improving, they are way ahead of me.

Instead of thinking about text, let's push things out a bit and think about more abstract things like art. Art is a physical expression of some sort of personal image (in many cases). The artist imagines the outcome and renders it with some physical medium (I'm including computer animation as being a physical tool). But what if you could render a dream? What if I could send you one of my dreams and you could play it back? That raises a lot of questions about dreams and their context. Do you have the right background experiences to feel the true terror I might experience from being in a very particular closed space? Probably not. But it's interesting to think (no pun intended) that we might exchange thoughts in some way.

When Microsoft first produced a speech recognition engine, I tried it out at work. I thought it would be fun to be able to tell the computer to open and close files and do a variety of other mundane things. The results we horrifying. All sorts of misinterpretations occurred with the result that I was terrified to speak at all. Who knew what file sit was moving or deleting? After a bit more tinkering it was just plain funny. We tried it on lots of people in the office with similar results. Unless you had a strong Texan accent, it had no clue and performed some apparently random act. Now let's extrapolate that experience to a beta copy of "Dream Sculptor". I'm assuming by this time we don’t need to get wired up to do thought input to the machine. I'll assume it's got some very sensitive electromagnetic sensors that can produce very accurate 3D data of what impulses are going on inside my head. So we get close to the machine and push the big red "Think Command" button. I'm pretty sure that exactly at that point one of two things will happen. Behind door 'A' is a big, blank, empty space - no thoughts at all. Behind door 'B' is some completely random thought that most definitely does not need to get inserted into my dream sculpting program. Even as I'm typing this (slowly) my mind is wandering off thinking about all sorts of random things that would somehow appear in the output. And even if I do get to craft a decent dream sequence, can I edit it? Can I swap the face of someone I know for a celebrity? I'd like to think so but undoubtedly this won't work. The human mind is far too complex and (mine at least) thoughts are far too obscure to generate any kind of coherent image.

Work has been going on for long time (certainly since the 70's) on converting nerve impulses into mechanical actions. The driving concept being the production of better prosthetic limbs. This has turned out to be more complex that was originally thought. It's not at all like turning on a light with a switch on the wall. Getting people to initiate nerve impulses for missing body parts takes a lot of practice and it takes a lot of signal processing to accurately tell what the intent was. Imagine how hard it is to do the same thing on the scale of ten of millions of firing neurons. For every pattern we might be able to detect at least part of the time there will be millions of patterns that are similar but which have quite different meaning. So while we might be able to develop a system to translate some very simple thoughts into turning a light on or off, I have great doubts that we can ever develop a thinking interface to an electronic machine. I suspect that a more productive path lies in transferring thoughts to some sort of biological machine but since I have trouble expressing my thoughts to my loved ones who have had plenty of experience in interpreting them, I am not optimistic that something grown in a Petri dish will fare any better.

Nigel

Thursday, August 28, 2008

Web Widjets

So here's a good idea that's not quite right in terms of it's implementation. Sites like widgetbox (www.widgetbox.com) offer some rather cool add-ons for your blog site. In theory all you need to do is click a few times, and voila, you have a widget in your blog. This is a two part operation. The providing site generates some javascript and/or HTML and the receiving site (your blog) pastes that piece of script into place. An interface between the two helps to automate the process. Or so it seems. I tried several times to get a countdown widget from widgetbox (and from Krista's blog: http://kristalcs855.blogspot.com/) with no success. What I get is an empty box with the rather sad title of "No widget found". I'm sure this is yet another case where despite nearly 40 years of programming in various languages, I just don’t get the concept. I'm usually trying too hard to understand how it works rather than how one might use it. None the less, I don’t have a working widget and I don’t know why. As I have done in the past, I right-click the page and take a look at the source. This is, of course, the source generated by the server which is often not the 'source' code but rather the intermediate result of a load of code on the server and an HTML page template of some kind. In other words, without knowing the intent of the original author, I don’t get any real clues as to why it doesn't work. If I were to ask Kristal, I'm sure she'd say "It worked fine for me". Oh to be the chosen one. But sadly, that is not my lot. I dutifully register, click the buttons and fail - almost every time.
So based on my experience (not yours, mine) I have to say that although the idea is pretty cool and although it seems to work for the chosen ones, it doesn't work for me and I suspect for many others too. And for those of us for whom it does not work, we have no real way to find out why not. And this is a problem because in the brave new world of Web N.0 (where N is a number in a monotonically increasing series) there will be many widgets and all the cool people will have them on their blogs and on their phones and even tattooed into their skin, but I will be left with a pale grey rounded rectangle containing the words "No widjet found".
To add insult to the programming injury, the failed widget includes a button: "Get Widget" so that you too can have a failed widget. Or perhaps not. Perhaps you will click the button under my failed widget and it will work for you. Please let me know if it does so I can order a bigger supply of anti-depression meds next time around.

Nigel
(without a widjet to my name)

Saturday, August 23, 2008

the listening computer

Work started in the early 50's at Bell Laboratories to develop machines which could recognize elements of human speech [1]. Since then there has been great interest in developing systems which can process natural language. Today we find these systems in use primarily in telephone help systems where a computer and a voice recognition and synthesis engine is used to answer questions. The ability of these systems is still very limited and in most cases they are used to recognize just a few words and numbers. The words must be spoken clearly and such systems are often confused by different accents.
Several companies have produced speech recognition systems as commercial products. The author's personal experience with offerings from IBM and Microsoft is that even after considerable training, these tools are poor at best and the resulting manuscript resulting from a dictation session requires so much editing that the overall effort is more than would be required to type it in directly. For those of us who cannot touch type and who are prone to spelling errors and other character reversal mistakes, a good speech to text interface would be a great help. For people with physical disabilities, a good voice interface to a computer could make dramatic changes in quality of life.
The primary driving force for computers that can understand natural language is probably to reduce costs in call centers associated with large businesses. If computers can understand spoken language accurately and this technology can be combined with Artificial Intelligence then we have the potential for really useful support systems which could be far more effective than a poorly trained human reading from a script.
Combining the recognition of human speech with AI systems is being pursued for several research projects notable Project Halo which is funded by Paul Allen's Vulcan Ventures [2]. Project Halo's goal is to produce a "Digital Aristotle" - a teaching tool capable of answering scientific questions. Halo has produced some good results with text input - demonstrating the AI part of the program. In subsequent phases the intent is to include natural language processing and to develop the knowledge base using scientific personnel rather than knowledge base engineers.


[1] "Automatic Speech Recognition – A Brief History of the Technology Development, http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/354_LALI-ASRHistory-final-10-8.pdf."
[2] http://www.projecthalo.com/.