Desti Natural Search Comes of Age

Over the last couple of years, we’ve been playing with various user interface for iPad based (really – keyboard / screen / touch based) natural language interaction. This is (surprisingly?) different than voice-driven interaction, and an extremely effective way to search. My blog post about the evolution and the learnings is here:

http://blog.desti.com/index.php/2014/why-is-natural-search-awesome-and-how-we-got-here/

Desti Natural Language Search UI

 

Google Glass from the Subject’s Perspective

Last week I had the honor and pleasure of being the first ever subject of a press interview conducted using Google Glass – followed up by a very interesting discussion with Robert Scoble. Here are some of the insights we’ve discussed, as well as some subsequent insights.

Screen Shot 2013-04-26 at 5.18.52 PM

Photography and Video will be impacted First

Consider how phone-based cameras have changed photography. My eldest daughter is almost 9 years old. We have a few hundred images of her first year, and about 10 short videos. My son is now 18 months old, and as my wife was preparing his first scrapbook album last week, she’s browsed through several thousand digital photos. On my phone alone, I have dozens of video clips of him doing everything you can imagine a baby doing and some things you probably shouldn’t. The reason is simple – we had our smartphones with us, they take good photos and store them. And should I mention Instagram?

Google Glass takes this to the extreme. With your smartphone you actually have to reach for your pocket / bag, click the phone app, point and shoot. Google Glass is always there, immediately available, always focused on your subject, and hands-free. Video photography through Google Glass is vastly superior for the simple reason that your head is the most stable organ in your body. What all of this comes down to is simply that people will be shooting stills and video all the time. Have you seen those great GoPro clips? Now consider having a GoPro camera on you, ready and available – perpetually. There will not just be a whole new influx of images and video but new applications for these too. Think Google StreetMaps everywhere, because the mere fact a human looked somewhere, means it’s recorded in some server. In the forest, in your house, and in your bathroom. Not sure about the latter? Check out Scoble’s latest adventures…

Useful Augmented Reality – Less will be more

Having information overlaid on top of your worldview is probably the sexiest feature from the perspective of us geeks. The promise of Terminator-vision / fighter-pilot displays provides an instant rush of blood to the head. And surely overlaying all of the great Google Places info on places, Facebook (well – Google+) info on people, and Google Goggles info on things – will be awesome, right?

Well, my perspective is a little different. After the initial wow effect, most of these will be unwanted distractions. Simply put – too many signals become noise, especially when it’s human perception that is concerned. This lesson has already been learned with similar systems in aerospace settings – and there the user is a carefully selected, highly trained individual, not an average consumer.

The art and science will be figuring out which of the hundreds of subjects visible is actually of interesting enough to “augment”. This will require not just much better and faster computer vision (hard!) but much better and deeper understanding of these subjects – which one’s really special for me, given the context of what I’m doing, what makes it so, and when to actually highlight it. Give me too much signal and I will simply tune out, or simply – take the damn thing off.

Achieving this requires a deeper understanding both of the world and of the individual. Deeper, more detailed POI databases (for places), product databases (for objects), and more contextual information about the people around me, what their contexts are – and what is mine. It is almost surprising to what degree this capability is non-existent today.

Initially – Vertical Applications Will be Key

Consider the discussion of video photography above. Now put Google Glasses on every policeman and consider the utility of simply recording every interaction these people have with the public. Put Google Glasses on every trainee driver and have them de-brief using the recorded video. Or just take it with you to your next classroom. Trivial capabilities like being able to tag an interesting point in time and immediately go back to it when you re-play – how useful is that?

And considering augmented reality – think of simple logistic applications, like searching a warehouse, where the objects are tagged with some kind of QR code, and a simple scan with your eyes allows you to get a visual cue where they are. The simple applications will deliver immense value, drive adoption, experience, and through those – curiosity and new, further reaching ideas.

And if you stuck around this long – here are my most amazing revelation:

  • Wearing Google Glass grows your facial hair!

Proof:

Sergey Brin Google Glass       Scoble Google Glass         Tim Google Glass

  • Google Glass vide makes you photogenic – watch Scoble’s interview of me and compare to my usual ugliness…

The Case for Siri

Since Siri’s public debut as a key iPhone feature 18 months ago, I keep getting involved in conversations (read: heated arguments) with friends and colleagues, debating whether Siri is the 2nd coming or the reason Apple stock lost 30%. I figure it’d be more efficient to just write some of this stuff down…

siri icon

Due Disclosure:

I run Desti, an SRI International spin-out that is utilizes post-Siri technology. However, despite some catchy headlines, Desti is not “Siri for Travel”, nor do I have any vested interest in Siri’s success. What Desti is, however, is the world’s most awesome semantic search engine for travel, and that does provide me some perspective on the technology.

Oh, and by the way, I confess, I’m a Siri addict.

Siri is great. Honest.

The combination of being very busy and very forgetful, means there are at least 20 important things that go through my mind every day and get lost. Not forever – just enough to stump me a few days later.  Having an assistant at my fingertips that allows me to do some things – typically set a reminder, or send an immediate message to someone – makes a huge difference in my productivity. The typical use-case for me is driving or walking, realizing there is something I forgot, or thinking up a great new idea and knowing that I will forget all about it by the time I reach my destination. These are linear use cases, where the action only has a few steps (e.g. set a reminder, with given text, at a given time) and Siri’s advantage is simply that it allows me to manipulate my iPhone immediately, hands-free, and complete the action in seconds. I also use Siri for local search, web search and driving directions.

Voice command on steroids – is that all it is?

Frankly – yes. When Siri made its public debut as an independent company, it was integrated with many 3rd party services that were scrapped and replaced with deep integration with the iPhone platform when Apple re-launched it. Despite my deep frustration with Siri not booking hotels these days, for instance (not), I think the decision to do one thing really well – provide a hands-free interface to core smartphone functionality (we used to call it PIM, back in the days), was the right way to go. Done well, and marketed well, this makes the smartphone a much stronger tool.

But I hate Siri. It doesn’t understand Scottish and it doesn’t tell John Malkovich good jokes

As mentioned, I’ve run into a lot of Siri-bashers in the last year. Generally they break down into two groups. The people who say Siri never understands them, and the people who say Siri is stupid. I’m going to discuss the speech recognition story in a minute (SRI spin-out, right?) but regarding the latter point I have to say two things. First, most people don’t really know what the “right” use-cases for Siri are. Somewhere between questionable marketing decisions and too little built-in tutorial, I find that people’s expectations of Siri are often closer to a “talking replacement for Google, Wikipedia and the bible” than to what Siri really is. That is a shame; because the bottom line is that it is under-appreciated by many people who could really put it to good use. Apple marketing is great, but it’s better at drawing a grand vision than it is at explaining specific features (did I mention my loss on my AAPL?). While the Siri team has done great work at giving Siri a character, at the end of the day it should be a tool, not an entertainment app (my 8-year old daughter begs to differ, though).

OK, but it still doesn’t understand ME

First, let me explain what Siri is. Siri is NOT voice-recognition software. Apple licenses this capability from Nuance. Siri is a system that takes voice recognition output – “natural language”, figures out what the intent is – e.g send an email, then goes through a certain conversational workflow to collect the info needed to complete that intent. Natural language understanding is a hard problem, and weaving multiple possible intents with all the possible different flows is complex. It is hard because there is a multitude of ways for people to express the same intent, and errors in the speech recognition add complexity. Siri is the first such system to do it well and certainly the first one to do it well on such a massive scale.

So what? If it doesn’t understand what I said, it doesn’t help me.

That is absolutely true. If speech is not recognized – garbage in, garbage out. Personally I find that despite my accent Siri usually works well for me, unless I’m expressing foreign names, or there is significant ambient noise (unfortunately, we don’t all drive Teslas). There are however some design flaws that do seem to repeat themselves.

In order to improve the success rate of the automatic speech recognizer (ASR), Siri seems to communicate your address book to it. So names that appear in your address book are likely to be understood, despite the fact they may be very rare words in general. However this is often overdone, and these names start dominating the ASR output. One problem seems to be that Nuance uses the first and last names as separate words, so every so often I will get “I do not know who Norman Gordon is” because I have a Norman Winarsky and a Noam Gordon as contacts. I believe I see a similar flaw when words from one possible intent’s domain (e.g. sending an email) are recognized mistakenly when Siri already knows I’m doing something else (e.g. looking at movie listings).

This probably says something about the integration between the Nuance ASR and Apple’s Siri software. It looks like there is off-line integration – as in transferring my contacts’ names a-priori, but no real-time integration – in this case Siri telling the ASR that “Norman Gordon” is not a likely result. Such integration between the ASR and the natural language understanding software is possible, but often complex not just for technical reasons but for organizational reasons. It requires very close integration that is hard to achieve between separate companies.

So when will it get better?

It will get better. Because it has to. Speech control is here to stay – in smartphones as well as TVs, cars and most other consumer electronics. ASRs are getting better, mostly for one reason. ASRs are trained by listening to people. The biggest hurdle is how much training data they have. In the early days of ASRs, decades ago, this consisted of “listening” to news commentators – people with perfect diction and accent, in a perfect environment. In the last year, more speech sample data was collected through apps like Siri then probably in the two decades prior, and this data is (can be?) tagged with location, context and user information, and is being fed back into these systems to train them. And as this explanation was borrowed from Adam Cheyer, Siri’s co-Founder and formerly Siri’s Engineering Director at Apple – you better believe it. We are nearing an inflection point, where great speech recognition is as pervasive as internet access.

So will Siri then do everything?

That’s actually not something I believe will happen as such. Siri is a user interface platform that has been integrated with key phone features and several web services. But to assume it will be the front-end to everything is almost analogous to assuming Apple will write all of the iOS apps. That is clearly not the case.

However – Siri as a gateway to 3rd party apps, as an API that allows other apps that need the hands-free, speech-driven UI to integrate into this user interface, could be really revolutionary. Granted – app developers will have to learn a few new tricks, like managing ontologies, resolving ambiguity, and generally designing natural language user experiences. Apple will need to build methodology and instruct iOS developers, and frankly this is a tad more complex than putting UI elements on the screen. Also I have no idea whether Siri was built as a platform this way, and can dynamically manage new intents, plugging them in and out as apps are installed or removed. But when it does, it enables a world where Siri can learn to do anything – and each thing it “learns”, it learns from a company that excels at doing it, because that is that third party’s core business.

… and then, maybe, a great jammy dodger bakery chain can solve the wee problem with Scotland with a Siri-enabled app.

Oh, and by the way – you can learn more about Siri, speech, semantic stuff and AI in general at my upcoming SXSW 2013 Panel – How AI is improving User Experiences. So come on, it will be fun.

Moving Beyond Search: From Information to Knowledge

Moving Beyond Search: From Information to Knowledge

Link to my guest post on TechCrunch

My Birthday Gift: The Kindle Fire, and Why It’s The First Credible Android Tablet

Over the past 6 months, I’ve been watching perplexed as vendor after vendor launched Android Tablets into the market with no success. Perplexed for a simple reason – I could not understand how they expected consumers to buy their $559, $499 or even $399 tablets when they could get an iPad 2 for $499 and get the real deal – the TRUE status symbol, the best content & app eco-system. What were Samsung, Motorola, Dell and Asus thinking, I was wondering. Was it a shortage / price of components that pushed them to that price bracket? Was it protecting the brand at all costs, even failure?

A couple months ago, I asked a question on Quora and the results were staggering – over 20:1 for iPad.

So what has changed?  The $199 Kindle Fire. You can get two of those, and still have money for another holiday gift.

Amazon’s Kindle is an ecosystem, not a device. Amazon sees it as a way to make sure you buy all your content – books, music, TV – from Amazon. Just yesterday they announced the streaming deal with FOX TV - more free content for Amazon Prime subscribers. Guess which devices will feature it? Remember Sony’s Howard Stringer’s announcement a few weeks ago – “Apple makes an iPad, but does it make a movie?“. Amazon doesn’t make them, but it sure-as-hell moves them around. In a move right out of Steve Jobs’ books, Amazon is tying it all together – device, app store, content store, streaming rights (with free content for Prime members), e-commerce for physical goods, payment options (from one-click to credit cards), cloud storage, even a loyalty program!

Kindle now touches everything Amazon does, and so many other companies. It threatens Netflix streaming – Amazon is securing more content for Prime members, and has a sound pay-TV model with a complete eco-system around it and it obliterates all other Android tablet manufacturers volume forecasts for the holiday season (a $200 rival with a strong brand behind it).

And it’s a credible contender for Apple’s eco-system. It is as broad, as far reaching, and goes even further with physical e-commerce embedded.

Probably the only risk is execution. If the software / hardware is good enough (defined as – better than most Android implementations), this will make a huge dent in the market. iPad will become the high-end product, but Android, through Kindle, could be the mass-market. Not so different from iPhones and Androids, actually.

My pre-order is in.

How I Got It All Ass-Backwards, or How Android Got Free Again

Free!

Last week I wrote a piece about the huge cultural gap between Google and Motorola, and how Motorola is such an bad fit for the Google organization, and what it will do for it’s relationship with Android licensees. I also stated that if Google acquired Motorola for the patent portfolio alone – that’s not such a big deal in the marketplace.

Well boy was I wrong. A person who’s very close the story saw fit to fill me in.

Google’s acquisition of Motorola was indeed all about the patents. But not necessarily Google’s lack thereof, but really its licensees’. What Google is trying to do to the handset market is what Microsoft did to PCs – give the hardware market to cheap Chinese / Taiwanese / Korean manufacturers, and thereby own the software platform. The catch? The incumbents – Nokia, Apple, Microsoft (and Motorola) own restrictive patents. And they sue / charge these manufacturers to a point where they are agnostic between Google’s “free” OS and Microsoft’s “pricey” one. The only player in the Android camp who was relatively safe was Motorola, who owns a nice portfolio developed over many years.

Solution – Google buys Motorola and promises Android licensees a defensive umbrella – it will fight their patent wars for them with its newly acquired arsenal.

Right there and then, Android is free again.

So what is Google to do with the Motorola organization one might ask?

This is where it gets pretty interesting. You see Motorola is in Illinois. The state a certain president (and his associated mayor) come from. And 2012 is an election year. Who wants to see 10,000 layoffs in Illinois on an election year? Certainly not someone who wants to Do No Evil…
2012 Election

Google acquires Motorola. Say again?!

With so many so-called experts (read: people who use Google and used to have a Motorola RAZR phone) providing different angles on this acquisition, I figured it’s time to chime in. I have a pretty good handle on Motorola (you can Google that!) and think I know something about Google too.

And what I don’t get is the culture clash. Truly. Motorola, like it or not, is an 83-year old Chicago (well Schaumburg) company, and no, the split to MMI and MMS did not change that. It is a slow mover 18,000-employee corporation, with an organization that takes years to design products, and even under Sanjay Jah that could not change much.
You see, when a company is hit as bad as Motorola Mobility was hit in 2008-2009 (and by the way – that happened through their complacence over the success of the RAZR), the good, dynamic, innovative people tend to leave. Especially in a market where Google, Facebook and Groupon are snatching all the good people who’d still like to work for a “safe” company. The culture has not changed all of the sudden, nor was there a good reason for great people to join lately.

Google is, or aspires to be, a fast-mover Silicon Valley company with a flat hierarchy, a market-driven (really numbers-driven) no-nonsense approach, with little respect for old-world processes. And it wants to retain this culture while growing to 25,000 employees.

See the issue?

So if, as some people have suggested, Google is only after the patents and will spin out Motorola again as a stand-alone device manufacturer, not so much has happened in the market (but congratulations to all the lawyers, accountants, bankers and management consultants who’re going to get the fat checks).

But if Google is truly looking to become the anti-Apple and the Motorola team is its weapon-of-choice… well, good luck with that.

P.S.: I especially like the theory that Microsoft was going to buy Motorola which forced Google to buy them first. It’s just lovely.

Follow

Get every new post delivered to your Inbox.

Join 701 other followers