Subtitles section Play video
Horus is a wearable device for blind and vision-impaired people. It's
basically a headset with cameras and a powerful pocket-computer that stays in
the pocket and processes all the images in real-time. One day, we were in Genova just
in front of the train station there. We met a blind person and he begged for help to
reach the bus stop. And that event made us wonder all the difficulties that
people who have lost sight or are losing it have to face every day. And we discovered
that we could move our focus not from computer vision to another topic,
but bringing vision from robotics to people who need it, not to robots. We have a bone
conduction headset plus a stereo-vision system and a processing unit, which is
powered by the NVIDIA Jetson platform. The headset basically acquires images and
streams them to the processor where the deep learning and computer vision
algorithms run. The current status of our project is of advanced prototype. We've
made several iterations and we now have a custom board with a Tegra processor
and our own software comprising several functionalities. The way it's being
used today is we have a pool of testers and organizations that are helping us
testing the device, the packaging, and everything. We also have one blind
employee who is testing it every day.
We're working on face recognition and there we are working both on face
detection and also online learning, to allow the blind person to learn
the faces of new people on the go
while he is using the device. Then we have object recognition which basically works
in a very similar way but the underlying algorithms are a little different.
Then we have text reading: [Female voice] "I cannot fully express my gratitude to the exceptional team at
Doubleday" and by text reading, I mean that the device can recognize printed text,
for example, on books. And the device can also help the blind
person in moving around the city or also indoor spaces. In order to do that, we're
leveraging the power of stereovision. We're trying to understand what's in
front of the person like obstacles, signs, and then creating some sort of
audible map of the environment for the blind person so that, when he moves around he can
understand what's in front of him and then avoid any obstacles that he might find on the way.
We have scene captioning. The person queries the device through the press
of a button for a description of what the cameras are seeing and then to
generate a complete sentence that tries to describe what's in front of the blind person.
All of it is obtained through deep learning. [Female Voice] "A group of people sitting around the
table with laptops." We make extensive use of deep learning and computer vision
in our product because if you want to be useful you need to gather all the
possible information from the surrounding environment, and using only classical
computer vision approaches, that's not possible.
Our work is also to bring the same algorithms and make them run on our
mobile platform. This is made possible by NVIDIA's Jetson platform
because it allows for high performance and also power-efficient computing in a
small form factor. When we release our product, our goal is to basically
radically change their lives, because we want to make the world more accessible to them.
We cannot change the accessibility of every city, in every country, in every nation, but
we can create something that can bridge a gap, can make what's not accessible,
accessible to everybody.