CMU researchers show the potential for privacy tracking using radar
Imagine being able to resolve / rekindle domestic disputes by asking your smart speaker when the room was last cleaned or if the trash cans have already been taken out.
Or – for an overall healthier use case – what if you could ask your speaker to count the number of reps while doing the squats and bench presses? Or switch to the full-featured “Personal Trainer” mode – bark commands to go faster as you cycle on a dusty old exercise bike (that needs a peloton!).
And what if the speaker was smart enough just to know that you were having dinner and bothered to play some mood music?
Now imagine if all of these activity tracking smarts were available without any connected cameras in your house.
AAnother piece of fascinating research from Carnegie Mellon University researchers Future Interfaces Group opens up these kinds of possibilities – and demonstrates a novel approach to activity tracking that does not rely on cameras as a capture tool.
Of course, installing attached cameras in your home is a terrible privacy risk. Because of this, CMU researchers investigated the potential of using millimeter-wave Doppler radar (mmWave) as a medium to detect various types of human activity.
The challenge they had to overcome is that while mmWave offers “a wealth of signals close to that of microphones and cameras,” it has data sets to train AI models to recognize various human activities, since RF noise is not without it further data is available (as are visual) for training other types of AI models).
In order not to be deterred, they set out to synthesize Doppler data in order to feed a model for tracking human activities.
The results can be seen in this video, where the model correctly identifies a number of different activities, including cycling, clapping, waving, and squatting. Purely because of its ability to interpret the mmWave signal, it creates the movements – and was only trained on public video data.
“We are using a series of experimental results to show how this cross-domain translation can be successful,” they write. “Overall, we believe that our approach is an important step towards significantly reducing the burden of training such as human registration systems and that bootstrap applications can be supported in human-computer interaction.”
Researcher Chris Harrison confirms this The mmWave radar-based Doppler detection doesn’t work for “very subtle things” (like recognizing different facial expressions). But he says it’s sensitive enough to detect less vigorous activity – like eating or reading a book.
The Doppler radar’s motion detection capability is also limited by the need for a line of sight between the subject and the detection hardware. (Aka: “It can’t go around corners yet.” Which will surely sound a little reassuring to those who are concerned about the human recognition abilities of future robots.)
The detection of course requires special sensor hardware. But things are already moving forward on this front: Google has already dipped its toes over a project Solos – adding one Radar sensor for For example Pixel 4.
Google Nest Hub also integrates the same radar sense to track sleep quality.
“One of the reasons radar sensors are no longer being used in phones is because of the lack of convincing use cases (a kind of chicken and egg problem),” Harris told TechCrunch. “Our research on radar-based activity detection is helping to open up more applications (e.g. smarter Siris that know when you are eating, having dinner, cleaning or exercising, etc.).”
When asked whether he sees greater potential for mobile or fixed applications, Harris assumes that there are interesting use cases for both.
“I see use cases in both mobile and non-mobile areas,” he says. “Back to the Nest Hub … the sensor is already in the room. So why not to boot advanced features in a Google Smart Speaker (e.g. counting your exercises by repetitions)?
“There are a number of radar sensors that are already in use in the building to detect occupancy (but now they can tell, for example, when the room was last cleaned).”
“Overall, the cost of these sensors will drop to a few dollars very soon (some on eBay are already around $ 1) so you can include them in everything,” he adds. “And as Google shows with a product that fits in your bedroom, the threat from a ‘surveillance society’ is much less of a concern than camera sensors.”
Startups like VergeSense are already using sensor hardware and computer vision technology to conduct real-time analysis of indoor spaces and activities for the B2B market (e.g. measuring office use).
But even with local processing of low-resolution image data, the privacy risk can still be perceived with the use of vision sensors – certainly in consumer environments.
Radar offers an alternative to this kind of visual surveillance, which could be better suited for data protection sensitive devices like “intelligent mirrors”.
“If it’s processed locally, would you put a camera in your bedroom? Bath? Maybe I’m a prude, but personally I wouldn’t, ”says Harris.
He also points to previous research that underscores the value of adding more types of sensor hardware: “The more sensors, the longer interesting applications you can support. Cameras cannot capture everything and they also do not work in the dark. “
“Cameras are pretty cheap these days and difficult to compete there, even if radar is a little cheaper. I think the biggest benefit is privacy, ”he adds.
Of course, any visual or other sensor hardware poses potential privacy issues.
A sensor that shows you when the child’s room is occupied can be good or bad, depending on who has access to the data, for example. And all kinds of human activity can generate sensitive information, depending on what is happening. (I mean, do you really want your smart speaker to know when you’re having sex?)
Radar-based tracking, while less invasive than some other types of sensors, does not mean that there are no potential privacy concerns at all.
As always, it depends on where and how the sensor hardware is being used. However, it is hard to argue that the data wheel generated by the radar is likely to be less sensitive than equivalent visual data when exposed via a violation.
“Any sensor should, of course, raise the privacy issue – it’s more of a spectrum than a yes / no question,” agrees Harris. “Radar sensors are usually rich in detail, but unlike cameras, they are highly anonymous. If your Doppler radar data leaked online it would be hard to be ashamed of. Nobody would recognize you. If cameras from your house are leaked online, well … “
What about the computational cost of synthesizing the training data given the lack of readily available Doppler signal data?
“It’s not turnkey, but there are a lot of large video corpuses (including things like Youtube-8M),” he says. “Downloading video and creating synthetic radar data is orders of magnitude faster than recruiting people to come into your lab to collect motion data.
“By nature, you spend 1 hour on 1 hour of quality data. Whereas these days you can pretty easily download hundreds of hours of footage from many excellently curated video databases. For every hour of video we need about 2 hours to process, but that’s only on a desktop computer we have here in the lab. The key is that you can parallelize this using Amazon AWS or an equivalent and process 100 videos at the same time, so throughput can be extremely high. “
And while the RF signal is reflected to varying degrees from different surfaces (also known as “multipath interference”), Harris says that the signal reflected from the user is “by far the dominant signal.” That said, they didn’t have to model other reflections to get their demo model to work. (He notes, however, that this could be done to further improve skills “by extracting large surfaces such as walls / ceiling / floor / furniture with computer vision and inserting them into the synthesis stage”.)
“The [doppler] The signal is actually very high and abstract, so it is not particularly difficult to process in real time (much fewer “pixels” than a camera). “He adds. “Embedded processors in cars use radar data for things like collision breaking and blind spot monitoring, and those are low-end CPUs (no deep learning or anything).”
The research results will be presented at the ACM CHI conference along with another group project called Pose-on-the-Go, which uses smartphone sensors to approximate the user’s full body posture without wearable sensors.
The group’s CMU researchers have previously demonstrated a method for inexpensive indoor smart home detection (even without a camera), and last year showed how smartphone cameras can be used to provide a device with an AI assistant in a contextualized manner.
In recent years, they have also explored the use of laser vibrometry and electromagnetic noise to bring better environmental awareness and contextual functionality to smart devices. Other interesting research by the group is the use of conductive spray paint to turn everything into a touch screen. And various methods to expand the interactive potential of wearables – for example, by using lasers to project virtual buttons onto a device user’s arm, or by including another wearable (a ring) in the mix.
The future of human-computer interaction will certainly be much more contextual – even if the current generation of “smart” devices can still trip over the basics and appear more than a little silly.