Audio Sensing

From CS294-84 Spring 2013
Jump to: navigation, search

Reading Responses

2/4/2013 10:07:31 Ben Zhang, PhD

"* Scratch Input This paper tries to turn pervasive surface into input device by monitoring the sound of scratch. It is based on an interesting observation of the sound generated by scratch, and such idea must have broadened many people's thoughts about always-available input.

With the proposal of an idea, this paper talks about the pros and cons of such input mechanism. A prototype and some example applications are the follow-ups for a HCI idea. A naive yet useful evaluation (based on user study) demonstrates the accuracy of such system -- ~90% accuracy for simple gesture recognition.

- Scratch Input takes use of the overlooked daily behavior and convert the walls/tables into interesting interactive media. Brilliant idea.

- For the sound approach itself, it has many limitations: difficult to address (you might differentiate different gestures, but who is the initiator and which is the receiver?). Also, though it's relatively low amplitude, generally the sound would still propagate and people might be affected. (Even I know some people are disgusted by scratches).

- Another concern of such input is noise resistance. Either the system should be designed in a complicated fashion, or some random scratches, or other noises may affect. At least, this paper didn't mention too much evaluation on this aspect.

- Due to the pervasiveness of sound, similar inexpensive, unpowered and mobile systems can be designed relying on sound sensing. The paper is short but informative with their approaches of designing such input system.

  • Skinput

This paper appropriates human body for acoustic transmission, using vibration sensor array to capture ""tap"" response and conducting gesture classification. The observation is that human skins are usually neglected but actually natural and always-available for the input. The sound waves generated by taps can be captured and used for sophisticated gesture to explore new user interface.

The Skinput system starts with some conjecture of wave propagation in human body after taps. Various sensors are considered and the appropriate one - vibration sensor arrays - is selected for proof-of-concept. With a relatively less efficient machine learning approach, the classification accuracy seems satisfactory. Then the author designed many additional experiments to test the effect of different sensing point, multiple gesture recognition, the correlation between accuracy and the participant's body characteristics, and the potential noise signals when users are walking or jogging. Overall, the system yields a ~85% accuracy (many factors might affect this value), and the example applications in videos are compelling.

- Since the discussion has been covered on the different possible waves in human body when tapping, I would naturally expect this being used in later modeling and classification. However, the classification is done in a brute-force fashion.

- Utilizing the sound wave generated by tap intrinsically relies on the tapping speed (not so much with force, in my opinion). However, this paper hasn't talked anything about this. Can the user touch slowly and softly?

- I am wondering about the comfortness of equipping such device. Since it seems that users should have it closely touched on his arm (either above or below elbow).

- I love the idea of collecting and classifying taps on human body so that skins are turned into input devices (with the ability of localizing source). The evaluation in this paper is abundant, but there could always be other issues that designers might need to consider. But the overall reading is a great joy for me to find out the possibilities that such devices have brought.

  • Acoustic Barcode

This is a design of a passive, durable and inexpensive sound-based tag system. QR-like vision-based tags, RFID tags have been used in many scenarios; the possibility of using sound as tags hasn't been fully explored. This paper uses a patterned notches (together with a scratching object) to achieve sound-based tags.

Once proposed, it seems obvious to use the patterned sound for tags. This paper mainly focuses on the implementation (mostly audio processing and error-correcting encoding design). Some example applications are also discussed as the motivation of using these barcodes. The overall accuracy is around 90%, which changes if the scratching object changes or the length of bits changes (obviously).

- I find the acoustic barcode is less compelling in many scenarios when compared with vision-based approach. But as an exploration of the possibility, I love the idea of considering sound. "

2/4/2013 20:59:00 elliot nahman

"All three papers by Chris Harrison present new methods of implementing sounds produced as a person touches or scratches other objects in their proximity. The first, Acoustic Barcodes, looks at how the sound of a finger running over a series of printed lines can be used much like a barcode. The second, Scratch Input, explores how gestures on a surface such as a table can be sensed and interpreted by a computer. The third, Skinput, looks at expanding the former method onto the human body, so a person can use gestures on their own skin as commands to their computer. All three push the boundary on how sound and touch are currently used in HCI. Touch through capacitance sensing is the norm, with computer vision as a theoretical probable future technology. Yet these ignore the dimension of sound which can be a great interaction method.

Although acoustic barcodes are conceptually interesting, I question their durability compared with other types of barcodes such as QR codes. They are probably less susceptible to dirt than QR codes, but imagine eventually the ridges will wear down. Also, I feel like a main argument in the paper as to why acoustic barcodes are better is the fact that a mic is cheaper than a camera, but all phones today come with a camera and not the specialized mic needed to pick up such a barcode. Given the limited amount of data that can be transmitted via a barcode, a specialized sensor on a phone or device is not justified. Even if such a specialized mic were not required or was present, the visual barcodes remain easier for people to print out. Acoustic barcodes require an ability to remove material, such as a laser cutter, which is beyond common household technology; printing a QR code is more approachable.

Scratch Input is quite an interesting extension from the acoustic barcodes. The notion of using your everyday surfaces as potential input devices is quite intriguing. It makes me think of the graffiti type languages developed for text entry. Although a simple set of commands would be easy to implement, a richer set such as a full alphabet would be quite difficult and face a lot of the challenges graffiti languages had for adoption.

With Skinput, the notions of just using one’s body as an input device is quite compelling. However, as mentioned in the Questions section of Harrison’s talk on Monday, it seems like you loose some opportunities for haptic feedback. Harrison made some comment about how people have a good sense about where they touch on their body and as such, visual cues are sufficient feedback to indicate clicks. However, this goes against some of the notions of always available interactions from last week’s readings. Last week, it seemed the focus was on how to present interactions without the need of visual stimuli since this can be a problem in situations such as driving. When “clicking” on one’s fingers, feedback is probably less of an issue as they are very specific, limited areas, and for specific applications, each finger can be tied to a particular action. For specific applications such as a music play, this is probably sufficient, makes more general interactions with one’s phone more problematic. The forearm and palm examples increase the number of actions one can make, but then require the user’s visual attention. Returning to haptic feedback, I am quite a fan of haptic feedback on my devices and tend to think that cues other than just visual would still be valuable even when interacting with one’s own body.

In the end, I think that these sound input methods are most valuable as part of a suite of interactions, rather than standalone. I find the Tap Sense project Harrison presented to be quite indicative of the potential of combining different interaction methodologies together to form a more holistic experience. "

2/4/2013 21:58:28 David Burnett

"Acoustic Barcodes

This paper presents grooved patterns stamped or molded into solid objects. These grooves, when strummed with a fingernail or similar, produce unique sound pulse patterns. These patterns are picked up by a contact microphone, processed digitally, and used to identify which barcode was strummed.

Acoustic barcodes are very simple to manufacture. They can be integrated into any molded object at fabrication time or easily etched into most any solid with over 1 billion unique barcodes possible, and the mics used to pick up the signals are very cheap. The mics also need not be directly connected to the barcode, which further eases installation and ubiquity. As a result, these identifiers can easily be distributed.

Because the sampling is acoustic, common off-the-shelf components can be used to implement the necessary pickup. The processing needed to accurately identify a given barcode is also very low-overhead, which makes this identification method suitable for very low-power embedded applications. With ease of manufacturing and simple, accurate reading, this technology seems destined for rapid adoption.

Acoustic barcodes have several significant technological limitations, however. The 96kHz sampling frequency falls well outside the range of standard 44 or 48kHz audio hardware, which makes very cheap (or free, in the case of mobile phone mics) sampling impossible. The author claims the barcodes to be durable, but most materials described will show significant signs of wear in less than a year given the presented use cases. Correctly reading the barcode is dependent on an even strum, requiring some amount of user finesse to activate it properly.

Secondly, realistic use cases are unclear. For contact-based situations, where the mic is already attached to the surface on which the barcode rests, there is no compelling advantage compared to a simple button. For remotely-interrogated mode, where the microphone-instrumented wand is strummed across the barcode, established RF and optical methods (e.g., RFID, 2D barcodes) contain more bits, better error recovery, and are already established technologies. More bits also means direct, low-latency communication; the comparatively few bits in an acoustic barcode means a lookup table (likely web-hosted and therefore high latency) must be used with each unique identifier. For existing interrogated methods, the data payload is included with the scan.

Scratch Input

On the subject of contact microphones, previous work by the author involved processing and uniquely identifying more natural, non-coded sounds created when humans interact with surfaces, namely running a fingernail along them. This work sought to acoustically sample and classify several types of scratching input on walls, desks, and other rigid flat surfaces that conducted sound well.

A huge advantage to this type of input is how simple it is to set up. One contact microphone on any interior or exterior face of the surface to be sensed is all that's necessary. Like the previous paper, sampling and processing the signals can be done with readily available hardware. Therefore, the technical barrier to adoption of this type of input is very low.

In user studies, training required to use the scratch input system was very short. Users are very often hesitant to learn a new input method, likely as a result of such long history with well-established computer interactivity paradigms. With five-minute training to learn several recognizable control gestures, the human barrier to adoption is also very low.

After being easily trained, the system success rate appears to be too low for actual use. Approximately one in ten gestures were missed by the gesture classification system, which is far too frequent for users to depend on it. In addition, scratch input is likely too slow for users. Sampling & recognition speed is faster the threshold of human perception, but gestures demonstrated in the paper require engagement of the entire arm, greatly increasing effort and time to perform.

Suitable surfaces for scratch input are purported to be ubiquitous, but my anecdotal survey finds this not to be the case. Most indoor painted surfaces will suffer damage with even a few weeks of scratch-based input, and most horizontal surfaces are normally crowded with objects during active use. Many logically separated surfaces are acoustically coupled, meaning only one user per room, desk cluster, or work table can use scratch input at a time. Users are likely hesitant to run their fingernail across many types of surfaces for fear of nail damage or, more commonly, an aversion to the media-frequency resonant mode most commonly associated with scratching a chalkboard.


Finishing still within the realm of transconducted sound, the third paper concerns acoustic signals traveling through the human body. Scratches or patterned ridges are less suitable for this type of signal medium, so taps were detected instead. The skinput system is able to classify which area on the user's arm a tap was delivered using a cuff outfitted with sampling electronics.

The electronic cuff is fairly small and compares with armbands often used to holster entertainment electronics during exercise. This is a great advantage over more exotic user instrumentation. Chance of widespread user adoption of this technology is greatly improved as a result of being easily integrated with existing user habits.

The method of using many narrowband acoustic samplers in parallel contributes to very good signal frequency response and noise rejection. Each channel, tuned via proof mass, has frequency response isolated from neighboring signals instead of using a single broadband acoustic sensor that could be overwhelmed by a noise source. This high-SNR method is also suitable for miniaturization, further increasing the likelihood of user adoption.

While users may be quick to wear the sensor, I suspect actual use of the input system may fall off rapidly after initial tests. Most input points were on the anterior forearm which, while capable of supporting a stronger transverse wave, is generally more sensitive. Demonstration videos gave an idea of the force of each tap; after my own tests, I would hesitate before using such an input method regularly.

The armband style of construction and mounting gives rise to the idea that such an input method could be used during athletic activity, but tests revealed this probably isn't the case. Not only is was the success rate of intentional input rather low (~90%), but false inputs were detected jogging approximately every 90 seconds. More reliable input methods can be used during stationary activities, so the optimal use case of this input method is still to be determined."

2/4/2013 22:16:21 arie


An interesting idea of using a relatively crude acoustic information to turn any surface into an input device by recognizing specific ""scratching"" patterns. Ad-hoc instrumentation is very appealing for rapid installations and prototyping. Using features of the signal besides amplitude seems to be necessary for producing reliable results. I think that noise might be an issue, especially for use-cases which involve motion. I found the work lacking appropriate context in terms of placing this work in relationship to related papers.

Skinput :

A somewhat natural extension of the previous paper which extends the scheme from ""Scratch"" to multiple channel sensors, each tuned to a different frequency. The idea of extending acoustic sensing into the haptic realm crosses the sensory boundary and potentially inspires new sensor fusion modalities. I found the experimental data to support the reliability of the system insufficient. Particularly when it gets to typical daily settings, such as during activities (walking, running).

Acoustic barcodes: Tagging everyday objects with BRAILLE-script like digital tags is another example of extending the idea of sensory fusion: barcodes which are typically visual are implemented in the acoustic/ haptic. A comment I found interesting was the attention the authors give to the manufacturability of the novel digital tags using rapid fabrication tools such as 3d printers.

The unifying theme of the 3 papers indicates cross-polinating ideas between the different sensory modalities and the tradeoff between instrumentation agressiveness vs. bandwidth of sensory information. The advantage of the acoustic modality lays in the ability to separate the sensor from the signal source leading to minimally-invasive sensing. "

2/4/2013 23:09:46 Hallvard Traetteberg

"Harrison, Xiao, Hudson. ""Acoustic Barcodes: Passive, Durable and Inexpensive Notched Identification Tags""

The paper describes how a barcode etched into a material can be ""scanned"" by striking across it using a hard material and capturing (and analyzing) the resulting sound. The barcode can be used for identification and trigger a specific function in some system. They report a recognition rate below 90%.

The main advantages are 1) that the infrastructure can take care of the sensing, i.e. it responds to your striking gesture, rather than you read the barcode (using an app) and provide it to some infrastructure or service, 2) the sensor is cheap, 3) the sensor can handle several barcodes and 4) can be attached out of the way. The main disadvantages are 1) the barcodes don't afford striking, since it's not obvious that the gesture will be sensed, 2) you don't get feedback during the gesture, as you can with a QR-reading app, 3) striking is not natural modality for communication, 4) ...

The paper does little to motivate their work, e.g. by identifying weaknesses with existing barcode apparatus or a scenario where existing barcodes clearly do not work. E.g. optical barcodes are easier and cheaper to make, can be easily spotted (and made invisible), can be read by a device everyone has nowadays etc, so when are you better off using acoustic barcodes? Although the paper presents related work, their don't show how they build upon or were inspired by this work. The design seems well done, and the evaluation shows that a relevant number of bits of information can be encoded and recognized. However, the tests were in a lab with conditions pretty far from real use, and without a specific application it is difficult to judge whether the recognition rate is high enough.

Harrison and Hudson. ""Scratch Input""

The paper describes how gestures (taps and scratches) on a surface can be recognized by capturing the sound from the surface using an attached microphone. The recognition of gestures of varying complexity where tested and achieved an accuracy around 90% (in lab conditions after 5. minutes training). They did not test whether naturally occurring gestures could be mistakenly recognized.

Many of the same advantages and disadvantages as for ""acoustic barcodes"". The paper provides better background and motivation, but the recognition algorithm is not detailed and the experimental results are briefly summarized.

Harrison, Tan, Morris. ""Skinput: Appropriating the Body as an Input Surface""

The paper describes how gestures/taps on (the skin of) your arm/fingers can be recognized by capturing waves in the tissue and be used as input to applications. The argument is that your arm/fingers is usually available, so using it as an input medium will allow interaction without extra devices.

The paper is mostly devoted to the experiments that were performed to explore what gestures/tap (place and kind) that can be recognized and disambiguated. There is little discussion of whether this kind of interaction is natural or for which applications it would be particularly useful. I.e. a what-is-possible-paper, not what-makes-sense. "

2/5/2013 2:36:38 YU-HSIANG CHEN

"Acoustic Barcodes

Acoustic Barcodes capture the sounds of scratching a physical objects that have bars etched on them and translate them into inputs.

I think this research, along with the other two, opened a door for more possibilities in using simple sounds as inputs. Another interesting about this approach is that they are using common materials as their input triggers. That means, to create something that allows input, all you need to do is some carving and perhaps 3D printing.

It also separates the input trigger and receiver, making the physical part even easier to build. Comparing to a button-triggered remote controller, you don't need wires, electronics, nor batteries. And we could utilize smart phones, which are so common nowadays, to be the input receiver. Thus various toys could implement the physical models with low costs and all use the same app to facilitate the interaction. I think that is a brilliant invention.

Having said that, it also has some limitations. First and most obvious is the accuracy. Suppose you are doing a presentation, would you risk the 10-15% error rate of using the acoustic, or just buy a remote presentation clicker. Second, the size. This approach requires a certain length and not can't be placed to close to other bars since it needs space to pickup the start and end signals. Therefore if we were to have a couple controls grouped in a small device, buttons would still be more ideal. Third limitation is the total usage of such mechanism. When there's only one device that uses this approach, it's fun and effective. But when you have 5 or more acoustic barcode objects, no matter you use a single receiver or one receiver for each, it'll be confusing. The codes might need extra bits for namespaces.

Scratch Input

This approach uses acoustic sound triggered by drawing fingernails on surfaces such as tables and walls. By analyzing the sound traveled through the surface to a stethoscope, the system is able to identify line, circle, and other shapes.

As mentioned earlier, it uses common materials and separates the input triggers with the receivers, which allows the designer to make low cost triggers (in this case, no cost, since the tables and walls usually already exist) independently, and then build a receiver that can work for various input trigger objects.

A smartphone or laptop can be easily turned into such input receiver, making it so easy to build. As I was thinking what other applications can be used, it occurred to me that it can replace current light switches. People no longer need to find the switch in the dark, nor do they need to ponder where to install it would be most convenient. Each room could have a receiver and users can turn on and off the light right at the wall besides their bed. However, as mentioned in the paper, there needs to be a good way to separate sounds made by adjacent rooms.

Another application that came into my head right after reading the introduction was the virtual trackpad. Currently if we use desktops with regular keyboards, we can't put trackpad right in front of the keyboard as it would block where our wrists rest. By using this approach, we could scroll with our fingers anywhere on the table. There's just one drawback/pity. It is only uni-direction instead of bi-direction. Just like the audio player example in the paper, if user needs additional interaction to toggle between a positive and negative input, such as volume up/down or scroll up/down, it's pretty unintuitive. But I think there should be ways to overcome this limitation.


This approach uses tapping on different location of the human body, e.g. the forearm, to trigger different sound waves. They are then captured by a bio-acoustic wearable sensor and interpret which part was tapped.

This is a very novel approach. It uses the human body as the source of input and uses the sensor in a non-intrusive way. I am also surprised that it can distinguish the tapping on forearm, palm, and fingers pretty well.

However, it terms of how useful it can be, I'm a bit doubtful. Wearing a huge armband to sense the input, not to mention additional projector is required if we want to see the options in a graphical form, it appears to me that using a smart phone would be just easier. In the scenario where the user is walking or jogging, I do agree that taking out ones phone and using both hand is a bit troublesome. But using methods such as earphones, adding functions to devices such as Fitbit to interact with the phone, or even sensors on clothes, should work better than wearing the heavy armband while jogging.

Overall, it is still an innovative approach and opens more possible space for us to discover.


2/5/2013 8:54:16 Joey Greenspun

"Acoustic Barcodes

This article presented a system composed of a patterned hard surface and a contact microphone, that could be used to uniquely identify the acoustic signal created by running a finger nail or other hard object across the grooved surface. This idea pushes the boundaries on how we can identify objects using sensors. When a user thinks about scanning an object with a smartphone, the things that come to mind are 2D barcodes, and maybe an RFID tag. This new modality opens the door to new and exciting acoustical means of tagging objects. This paper did a great job at introducing a new means of tagging and identifying objects. This method is exceedingly cheap and could be used on a number of surfaces. The limitations are very few. One application that was very appealing was the enhanced store front idea. Having shoppers be able to scan objects and get information on them is definitely attractive. However, is this really a big step ahead of optical barcodes? Additionally, there are quite a few issues with the technology. Security is never mentioned, probably because it would be pretty much impossible to implement. This technology will never be able to match an RFID tag in this respect, and thus will severely limit its usability. Additionally, the components were never tested for wear and tear. How do these acoustic signals change when the pattern starts to smooth out?

Scratch Input This article presents a unique gesture based input system for large passive surfaces, such as walls and table tops. Using an inexpensive and small sensor, it can detect a variety of signals performed on such surfaces. When we think about inputting information via our fingers on a surface, we typically think about pointing and clicking somewhere. Spatial information is what we are trying to convey. This technique blows that predetermined notion out of the water. This technology cannot determine where your finger is on the table or wall it’s composing gestures on, however it was determine how it’s moving about said surface. This paper does a fantastic job at identifying its weaknesses and defining its niche. It comes out immediately and states that it has no ability to determine where your finger is on the surface, and goes on to list the applications that it therefore cannot support. The article goes on to explain all the functionality that this technology could implement. One of the most compelling is the wall music player controller. Being able to functionalize an entire wall with a small and inexpensive sensor is very appealing. The user can pause/play songs, and cycle through to the next song using a very simple set of gestures. Additionally, the ability to put the sensor into a smartphone was very attractive. Being able to complete tasks on your phone without having to pick it up of even look at it is very attractive. The article mentions the possibility of silencing your phone from across the table or answering in speaker phone. The accuracy however still leaves a lot of be desired. It is incredibly impressive how much can be resolved from this signal, however 90% is not going to be good enough for any sort of commercial use.

Skinput Here is presented an input system that uses the human body, namely the arm, as an appropriated surface. The acoustic signals created from tapping on different parts of the arm are recorded by a novel sensing array. This research is really pushing the barriers on what we can do with our bodies in terms of HCI. Typically we use of fingers, voice, and other various body parts to interact with a touch screen, microphone, or camera, to send a desired signal. However, this technology has humans interacting with humans. We’re using out fingers to touch other various places on our bodies to perform a task. It is truly unlike anything else out there. This paper focuses on a very interesting problem; there is a drive for smart phones to be smaller and smaller, however we as humans cannot interact with devices that are arbitrarily small. So they focus on finding surfaces that can be appropriated for interactive tasks, namely the human body. One very interesting point is the ability for a person to interact with his/her body in an eyes-free manner. We as humans have been trained with this tool (our body) for our entire life. And it’s easy for us to complete tasks on it without looking (i.e. touching our nose with our finger). The sensor itself was incredibly engineered. The fine-tuned cantilever beams were completely custom made for this task. All the off the shelf bone vibration sensors could not match the finely tuned function of these beams. One very useful functionality would be the music player control while running. However, the issues were glossed over without any real steps proposed for moving forward. The number of false positives was very high for such a short experiment. I’m wondering if knowing there was a step taken (i.e. coupling the device with a pedometer) might help reduce this high false positive rate. "