How 4tiitoo’s Eye-Tracking Tech Can Be Used For Touchless Screen Interaction
November 25, 2020 by Dave Haynes
The 16:9 PODCAST IS SPONSORED BY SCREENFEED – DIGITAL SIGNAGE CONTENT
While we all have learned, and mostly remembered, to wash or sanitize our hands after we touch surfaces, the ongoing pandemic has undoubtedly made a lot of people antsy about touching any surfaces unless they really, really need to.
Self-service screens are one of those surfaces that makes at least some people jumpy, and things like voice-based ordering or throwing screen controls to the customer’s smartphone have come up as alternatives.
Now 4tiitoo, a German company that mainly does eye tracking for workplace environments, is touting a solution that would enable doing things like ordering a burger at a restaurant chain to be contactless.
A built-in sensor on the kiosk would track and respond to what a customer sees on the screen, all the way from a welcome message and through to order confirmation.
This is not stuff out of a sci-fi movie, but a riff on existing technology that takes endless mouse work out of repetitive office jobs and allows workers with greasy or occupied hands to navigate and update a screen just by looking at it.
Stephan Odörfer, one of the founders of 4tiitoo, walked me through the thinking, and how it all works.
Subscribe to this podcast: iTunes * Google Play * RSS
David: All right, Stephan, can you give me a rundown on what your company is all about and what your technology does broadly?
Stephan Odorfer: Sure. So at 4tiitoo, what we’re doing is all about natural interaction, using your gaze, first of all, but in some use cases also combining speech and gestures. So basically combined, I would say natural interactions that humans can do. So what we’re mainly doing nowadays is we are controlling standard workplaces, from accounting to support centers to engineering, and largely replacing the 50 year mouse, so that means, while you’re having your hands on the keyboard, you start typing and when you want to select a different input field, for example, in your SAP environment, you just look at this field, which you’re doing anyway, and you continue typing because our software understands what you want to do and basically predicts the intention and proactively helps you in your daily tasks.
So this is what we are usually doing for several years. Now, what we have been doing lately is hands-free or completely touchless interaction in kiosk situations, or in dedicated restaurant situations where you use your gaze, as you do today, you browse through the menu, an automatic disc will roll a list while you’re reading it and our system understands that you are currently in reading mode and obviously what do you want to do at the end of a page? You want to continue scrolling it? So the system automatically scrolls for your reading speed and what it also does is obviously if you want to select a salad or a burger, you just look at these items, and then we developed a special way to basically trigger these elements, because it’s not about just looking at them and boom, it happens. That’s not good. So it’s rather a way to basically first select something and then trigger something just with your gaze.
So this is in a nutshell what we’re doing.
David: The restaurant applications for self-service ordering and so on, was that something that was already an ask from operators prior to COVID or is it because of the pandemic that this is now something that’s being put together?
Stephan Odorfer: It was clearly connected with the pandemic, the topic of people being afraid of touching surfaces or the need in restaurants to continuously make sure that these surfaces are clean. We have seen this in manufacturing environments. We have seen this for a long time already so we have productive solutions for people on the shop floor, having gloves, et.c.controlling their shop flow terminals just with their case, so that they can continue doing their actual job with their hands.
Now going into the restaurant business, it’s quite similar because it’s a clear interface, you have a bunch of options that you can touch or click on and now basically look at, and it’s not a complex interface. So what people are doing there usually is pretty similar and due to the COVID-19 situation, people were looking at solutions for not having to touch things, on the one hand, and what we could offer with this technology is not only replacing this need to touch something but it’s also using our gaze technology, we also see what people are interested in and based on this information, we can predict what the user wants to have, in this case, what he’s looking for and then in this case, can propose recommendations for based on his current view, his gaze history, not based on the history of other people, but on his very history. And, therefore we can not only make his experience more personal, more individual, but also create a potential for upselling and cross-selling, from the store.
David: So you’re using machine learning or some kind of computer vision to do that?
Stephan Odorfer: Yes. So it’s a local based technology because privacy is an important topic for us. So it’s not being transmitted in the cloud, everything that we do here in this personalization runs on the local system and what it basically does, it understands where you’re looking at and what you’re not looking at, which is very important.
So this is an area where we have already filed several patents because one of the most important things is obviously at work is what you look at and how you look at things, but what is also very interesting is what you’re not looking at, because in your peripheral view, our autopilot works. So, the autopilot in our brain basically works if I say two plus two, in our brains, it just makes sense. But if I say 42 times 17, there are not that many brains where the solution comes up. So we need to focus on that and do the math, but the automatic part in our brain that is basically getting, selecting what should be put in or focus. So in this area, when I’m scrolling, for example, let’s say scrolling through a menu in a restaurant that has different pizzas. For example, what I’m doing is, I’m basically quickly running through a page and I’m scanning things like we do if we do a Google search, for example, what is reading each word and if it makes sense, if our autopilot says, “Hey, this could be an interesting result,” only then we put our focus on it and then we read it and then we decide, and we basically understand what you already get rid of in your autopilot, because this is very important to create a better funnel and to create better results in less time.
David: Okay. So when the original, maybe not the original, but the most familiar gesture sensors out there, the Kinect sensors that came out maybe 10 years ago, there were attempts at that time to use gestures as an interactive interface for screens and the objection I always had and observed, was that there was a big learning curve for people to figure out how to use this, and I would imagine in something like a restaurant environment, where a lot of the goal is to speed up transactions, speed up decision-making and everything else, how do you get past something where you’re walking up to a screen for the first time and it’s an unfamiliar interface?
Stephan Odorfer: That’s a very important question. You’re absolutely right. The older folks out there probably remember the Minority Report scene with Tom Cruise, where he’s doing all these fancy moves and controls the computer with that. Well, think of this as if you would do this next too. The learning curve you were mentioning, it’s also a topic of how long you want to do this until this is more a sports event than an operation off of a screen, right? Your eyes are constantly moving anyway. So before you touch something, before you gesture, or just pointing at something or click a mouse or whatever, your eyes are always already at this point that you want to address. And that’s the magic of eye tracking.
So the question is how do we make sure that, in our standard environment, there’s a steep learning curve that people get familiar very fast with this technology and on the other hand, in a kiosk mode, people get that it’s a robust environment, it’s a robust operation. It’s completely a hundred percent sure operation in terms of that the user knows what’s going to happen and what to expect, etc. because this is necessary that people would accept it. And what we came up with is a way to confirm things and rethink from basically his gaze information so that we can predict what he wants to do and why we have done this is because this is something that we are focusing on, so the company itself is seven years old, but, my partner and myself, the founding partners, we are focusing on this technology now for almost 10 years. And we came up with, if I remember the first days and the first month and years, it looked pretty different, in terms of how we control computers today.
So let me explain with an example. If you look at things, a button for example, what it usually does today, it has a so-called dwell time, which basically means time taken between you looking at a button and it triggering within a specific time of say two seconds or whatever. So as always, it’s taken too long for a nice interaction and it’s too short and therefore too many false positives, if you speed it up. So what we came up with is basically a way that you select within a split second, you select something, you can basically think of in a Windows environment, you can select on the desktop if you click an icon, if you click it once; you select it, if you click it twice to double click, then you trigger it. But you can also look at once, select it and then press return, that’s basically the same. So you select it and then you trigger it. And that’s the same, what we are doing. So you look at a burger and you select it, right? If you want to trigger this and put this in the cart, for example, then every time you select something, a button pops up that basically triggers the selection and to put it into the cart or whatever happens, forced by this button. So it’s always a two step system and that’s robust enough, but if you know how to deal with the system, you can do it super fast because our eyes are controlled by the fastest muscles in our body.
So no matter if you are looking from the lower left part to the upper right part, you can do this in a split-second while if you would need to move your hand, that would take much longer.
David: And if you want to select something, are you blinking or doing something like that to confirm?
Stephan Odorfer: Yes, this would be a possibility to blink, but blinking is also controlled by the autopilot in our brain. So nobody is actively blinking except for these specific situations where you blink an eye to show somebody that you’re winking, so that’s a specific action, but if you would need to blink to every time that feels awkward, it is possible to control the interfaces with blinking and eye tracking is something that is here for many years for decades, in time. And where it came from is from psychological studies, from marketing studies. We all know these heat maps, search results, et cetera, where it also came from is that it’s possible for impaired people to control a computer. So that’s the great way and the only way for many people to take part in this world, that is the internet and communication, etc. And if they can only use their eye muscles, because everything else is not possible anymore. Then this is a great way,but there are other ways, better ways to trigger things which are not blinking.
Another opportunity would come up, maybe in your mind would be nodding, but anything like these blinking and nodding has too many false positives. So that’s why we came up with this other solution.
David: So walk me through this, if there’s a hamburger chain and you have a self-service ordering kiosk and you’re using your technology, I walk up to this thing, what is it telling me right away?
Is there a message that says you can navigate this whole thing just using your gaze?
Stephan Odorfer: Yes. It would not be that it’s only possible, right? So you should always have a fall back option in this case, this would be touch, right? Because you don’t want to force people in a direction you want to offer them a better way or different way in the first place.
And after they experienced it in a better way. So it would be introduced to folks as, “Hey, you can touch me, but you can also just look at me and I will understand what you want to do.” And, based on this very first approach, the system basically understands, “Hey, there’s a new customer approaching me”, so the system understands that somebody is looking at the system and it can welcome the customer. And based on this, if it’s a new customer who is not familiar with this technology at all, then it’s very simple, either he can take a short tour in the situation, which is , I would say something like 5-10 seconds long, basically just to understand what it is and I see great potential in terms of viral marketing here, because, just think about somebody controlling the device just with his gaze, and his buddy is filming this and putting this on YouTube showing how innovative this solution is basically, right? So he understands, okay, this is how I can do it. And then for example, if he goes through the list of burgers in this case, he doesn’t need to learn anything. That system understands that this guy is reading and it scrolls automatically as a biometric. So there’s no need to understand something. The only thing you need to understand is basically that you look at a button and you see a little shine around this burger, for example. So that must integrate into the user interface and the corporate identity of the brand, obviously and it gives a little shine so that means it’s selected. And then this button pops up and since this is the first time for a user this button popped up, obviously the user will have a look at this button, so that’s the way we say, “next time, you’re going to look at this pattern. You’re going to select this burger to put in the cart, understood? And then you just look at the button.”
And then you’ve got to go because you don’t need to learn anything else. You just know that it’s a system that scrolls automatically for me, and you understood this because you experienced this and if I’m looking at a button, okay, this other button pops up and once I’m looking at this button, I trigger it. If I’m not looking at this button, it’s vanishing right away after a split second.
David: So if I decide, I want the cheeseburger with bacon, a prompt will come up and if I look at it, it will confirm that I want that and put it into my “shopping cart”?
Stephan Odorfer: For example, we have different buttons in such an interface, for example, the amount, so I want to have two burgers or three burgers, what is the difference if you use gaze control instead of touch control or something. Nowadays our eyes are our sensors, they sense information and put it into our brain. What we are doing, it’s not only sensing, we are also making our eyes the actors, so they are acting actively.
So for example, if you think about driving in the car and you have both your hands on the steering wheel and you want to change the radio station. If you’re familiar with the car, you know, without having your eyes looking at the center cockpit, where you need to put your hands to turn up the volume or to change the station, right? Because you just know where your hands need to go, but if you need to look at this element, for example, you see +++, you would need to create the information within your peripheral view. If you look at +, you need to create the number that you’re currently at, right next to the interface because you need to understand that you are at 3+ and you can now stop looking at +++ because it goes up gradually. Similarly, if you say how many burgers do you want, then you just look at one of our five buttons, one, two, three, four, five, because that way the restaurants know what amount of burgers or salads or whatever people are usually having. So, you don’t need to choose 28 salads or something.
David: So you go through that whole ordering process and then you would use more conventional payment systems like credit cards, or maybe even a phone scanner, NFC tap or something like that. Is there a point because you’re already using a camera and you’re looking at the retina or the iris of the viewer, could you make payments off the biometrics of that person’s unique eye characteristics?
Stephan Odorfer: It’s theoretically possible, yes. And, if you look at many Asian countries, it’s already the standard, right? It’s not something sophisticated, that’s already something that they do on a daily basis.
David: But they don’t have GDPR there.
Stephan Odorfer: True. Absolutely true, and that’s exactly why I said it’s theoretically possible. Biometrical identification, for example, this eye tracker that we are using can also be used to log into your Windows system, using the Windows Hello technology and what it basically does is it’s not sending the data anywhere, it’s basically the same if you use your iPhone or your Android phone, that you use the same infrared based camera technology to identify that it’s you, but it’s only asking, is it you or not? You don’t need to have the connection to a database. That’s the main difference here.
As I said, it’s theoretically possible, but this is not that’s neither a focus of ours, nor it is something that is necessary in this case, because what you can actually do is you use a QR code to pay, that’s one thing, use a touchless credit card or debit card to pay, so there are many ways of contactless paying in a way. What it furthermore does and I pointed this out a little bit earlier, already. So while you are browsing this, and looking through the menu, basically, what we understand is what are you interested in? Because we know that such an eye tracker collects data at about 90 Hertz, so 90 times a second. We understand where you’re looking at and this information can be used to basically understand using this autopilot information, what you’re interested in and what not. So in this case, for example, we can say right before the checkout, “Hey, you were thinking about taking this ice cream dessert.”
So why not offer it again at the checkout, but as we know what kind of ice cream you looked at and thought about and based on this case pattern, we understand that you really thought about this, so it’s not historical information, it’s personal.
And therefore you have a much better conversion rate of having upselling and making the cart size larger.
David: If I’m a kiosk manufacturer and this intrigues me, I have QSR or other retail clients who might be interested in this, what are the hardware implications for this? Do you need to add a separate PC that just does that processing? Is there a separate, specific camera that you need? Those sorts of things?
Stephan Odorfer: So an eye tracker consists of three different parts. One thing is the infrared lighting, so that’s LEDs like you also have in your iPhone and Android phones today, you have a camera, a solution that is basically the same here. There’s a special infrared camera and you have an ASIC, so it’s a dedicated chip on board on the eye-tracker itself that does all the math, because through the USB port that is connected, only X, Y coordinates and X, Y set coordinates of your eyes are transmitted. So there’s no camera image transmitted or saved at all. Everything is calculated in memory and just not saved at all. Also in terms of privacy, this is a standard equipment that can be easily built into the hinge of a notebook. So it’s really small sothe volume you need to put into your existing kiosk solutions is really tiny.
And they’re the only thing that is necessary. It needs to be put below the screen so that you can easily track the whole screen range with this.
David: Okay, so you don’t need a separate PC running an Intel or that sort of thing to make all this happen. It can just happen off of a pretty simple hardware setup?
Stephan Odorfer: While you can do that, it depends on the use case that you want to do. For example, if you want to do the prediction, intention prediction parts that I was referring to, and this is something that is not produced on the eye-tracker itself, that is something that runs in the software on local hardware and therefore you should have an up to date device. This could be an Intel processor because the Intel processes have a dedicated deep learning algorithm embedded that we can use, and therefore much lower CPU consumption needed because it’s already built in. So the commands are built into the hardware itself.
David: So if you had a touchscreen kiosk, you could have both functions like all those stuff that the touchscreen kiosk normally doe, could run on there and your technology could run in parallel. You don’t need two separate devices to do all that as long as you’ve got enough hardware.
Stephan Odorfer: Yes.
David: Okay. There was a big fuss recently up here in Canada, where I live, about a shopping mall using cameras. And even though it was anonymous video analytics, it was misinterpreted and there was all kinds of upset about it, even though there’s really no reason to be. There’s nothing, no privacy invasion happening there.
How do you get past that with customers who worry about it and with the general public? Because even though what you’re saying is, it’s only the eye coordinates, people are going to see cameras and go, “Oh my God, my invasion, or my privacy is being invaded here!”
Stephan Odorfer: It is absolutely important. So first of all, it uses a camera, but it’s a sensor. So it means that the camera images are not saved anywhere. So that’s the first thing. Then in terms of our company, we are based in Munich in Germany and Germany has a very strict privacy law. So even in specific areas, there’s this even going further than GDPR requests and for the company itself and for me and my partners, this is a very important topic, because we want to make our vision.
And our mission is to make computers understand us humans and not us humans to understand how our computers operated. This has been for many years that we had to learn how things work. It’s now time that computers understand and predict how they can serve us because that’s their duty. So in this way, we need to have a better understanding of how we can serve and therefore we need data. If you don’t have data, and say if you want to learn swimming in a pool without water, it’s not possible. So you need the data. Therefore we have that.
We have certificates of Germany Privacy, that’s nothing familiar outside of Germany, probably, but it’s a DECRA, it’s called a data audit which makes sure how we handle data, how we process data, how we delete data and how we, anonymize and pseudonymized data and aggregate data. So to really make sure that the data we use, has nothing that can be transferred to any individual. That’s very important because I don’t want a big company to understand what I’m interested in.
The model that we follow for data privacy is basically, something is on one side of the wall, so that’s the local part and then there’s a part on the other side of the wall. To make this more plausible, think that you’re searching for a result and you need to access data online, because this is something that we also do. If you need to load more information, for example, they are doing an e-commerce search and you’re loading more information, more t-shirts that you’re looking for. So our way is that we ask for a hundred new results and locally, we only use 10 of them. So the guys can put a hundred results in the system, but they don’t know which ten of these hundred are we using and needing. So that’s basically how you can get around that somebody else is building a model about the person you are serving.
So that’s one thing and thinking about these kiosk solutions where all the data i, on the device anyway, and as nothing is transmitted to a server, to do the local optimization or the local personalization, there’s no problem in terms of privacy. Furthermore after the session is done and somebody else appears, then we start from scratch basically in terms of the data and personalization..
David: Last question. Is this all what we’ve been talking about conceptually, or are you in the field with self-service kiosks that are doing all of this?
Stephan Odorfer: So for now, three- four years, what we are doing is we are equipping large enterprises for their standard places, right?
So that’s efficiency in ergonomics and benefits. The same goes for the shop floor, so we have productive environments running in a hands-free touchless interaction, not collecting a burger, but just confirming a step in your assembly process, for example. So that’s what we’re doing through the pandemic.
We have seen this request from hardware, software and solution providers on the one hand, but also from the customer’s side since they are looking for other ways to solve their problems. So this is something that I’m pretty sure we’re gonna equip, in a few weeks, for example, a completely touchless QR system of a large company that offers their guests to understand more about the company and understand how to get around, sort of a compass, for example, completely touchless, and that’s pretty much the same because you have somebody approaching a terminal, the system says, “Hello!”, when it comes to you and you use it for 1-3 minutes, and then you move on.So that’s very similar And, so I’m looking forward to seeing this in kiosk environments.
David: So the interest is absolutely there, but we’re still in fairly early stages of seeing this out in the marketplace. Yes.
Stephan Odorfer: Yes.
David: Okay. Very nice to speak with you. Thank you so much for spending some time with me.
Stephan Odorfer: Absolutely. Thank you for the invitation.
Leave a comment