AS Saqib Shaikh enters a restaurant near his office in London, he asks for the menu.
He then slowly takes out his phone and hovers it about 0.3 metres on top of the menu.
A voice assistant app — Seeing AI — guides him to adjust the position of his phone slightly so that the entire menu image can be captured.
Then the app reads out the menu headings, from appetisers to desserts, for Saqib to make his choices.
He then calls the waiter and places his order.
This is quite a feat, considering Saqib is visually impaired.
Saqib lost his eyesight when he was 7 years old. Like millions of other visually-impaired people around the world, he cherishes the thought of being able to see and lead a normal life one day.
However, instead of waiting for a miracle to happen, he created one.
The Seeing AI, app that he uses is his idea, something he then worked on with his team at Microsoft.
“I’ve been working on the app for the past three years in the hope that it will improve the lives of people like me,” he says.
“One of the things that I’ve always dreamt of since my college days was this idea of something that could tell you at any moment what’s going on around you.”
The 35-year-old software engineer has been personally involved in the development of an application of artificial intelligence, cognitive computing, image recognition and mobile headset technologies.
The image analysis processing, cognitive reasoning and speech intelligence in the device that Shaikh uses allows him to “see” the world around him in a way that was then only possible in Hollywood movies.
“The Seeing AI app is an idea I had for many years, based on work by others scientists. So I made a prototype which was previewed at several Microsoft Hackathons. Currently, there are four people in my team developing the app,” says Saqib, who has won the Hackathon for the past three years.
Prior to developing Seeing AI, he had developed various Internet-scale services and software such as Bing, Cortana, Edge, MSN and various mobile apps.
WHAT THE APP CAN DO
The Seeing AI app gives people like Saqib the ability to “see” and walk around, travel and do things without fear.
The ability to read texts — like reading the menu — is just one of its capabilities.
There is also a smart Pivothead sunglasses developed to complement the app.
What it does is give the ability to see what is happening around the user in real-time.
For example, when Saqib puts on the smart sunglasses and presses a small button at its side, it will tell him what is happening in front of him.
The technology disambiguates and interprets data in real-time. In essence, it paints a picture of the world for him audibly instead of visually.
Saqib shows how, using his special glasses, by touching the edge of the glass, it will tell him something like “I think it’s a man jumping in the air, doing a trick on a skateboard”, interpreting what is happening in front of him.
“This way, the blind can experience the world in richer ways, like connecting a noise on the street or even at a meeting,” he says.
“When you are talking to a bigger group, sometimes you continue talking and there’s no response. You’ll be wondering whether everyone is listening or half of them are asleep. By clicking on the smartglasses, the app can tell me things like ‘I see a 40-year old man with a beard looking surprised’, or ‘a 20-year-old woman is looking happy’,” he adds.
The Seeing AI app can describe general age and gender of the people around him and what their emotions are.
It can also be used for shopping where users can scan barcodes and it will tell what the products are and their basic ingredients.
It is also able to recognise currency, even those in crumpled conditions.
Today, Saqib can even travel around the world alone with the help of Seeing AI.
To develop Seeing AI, Saqib uses Visual Studio 2017 with screen reader software for writing and debugging code.
The Screen Reader runs in the background and reads what is happening on the screen.
There is also an IntelliSense feature that provides instant context-aware help when writing codes in Visual Studio.
With the help of such tools, Saqib can code pretty fast, sometimes faster than normal people.
Currently, the Seeing AI app works in nine countries only, in English and on the iOS platform. But more enhancements are expected to come.
FUTURE OF AI
AI has a great future. During a conference where Saqib presented Seeing AI, Microsoft CEO Satya Nadella said that advanced machine learning holds far greater promise than “unsettling headlines” about computers beating humans at games.
“Ultimately, humans and machines will work together, not against one another. Computers may win at games but imagine what’s possible when human and machines work together to solve society’s greatest challenges like beating disease, ignorance, and poverty,” he says.
Doing so, however, requires a bold and ambitious approach that goes beyond anything that can be achieved through incremental improvements to current technology.
“Now is the time for greater coordination and collaboration on AI,” says Nadella.
He lauded what Saqib has done with AI.
“This approach is a start, but we can go further,” he says.
And for Saqib, apps like Seeing AI will make the world more equal, and open up possibilities not just for the blind but also for ordinary people as well.
WHAT IT DOES
The program lets users recognise:
Text — speaks text as soon as it appears in front of the camera.
Documents — provides audio guidance to capture a printed page — Recognises text, along with its original formatting.
Products — scans barcodes using audio beeps to guide you. You hear the name and package information where available.
People — allows users to save friends’ faces in their contact list so they can be recognised later.
Scenes (early preview) — hear an overall description of the scene captured.
Images in other apps — just tap “Share” and “Recognise with Seeing AI” to describe images from e-mail, photos, Twitter and more.