Cameras II: Hands

From CS294-84 Spring 2013
Jump to: navigation, search

Valkyrie Arline Savage, PhD - 2/15/2013 10:18:07

In general, the topics covered by this week's readings seemed to focus around hands and how awesome and useful they are. I am still a bit annoyed by “mid-air” interaction techniques, as the lack of tactile feedback to me seems disorienting, and the interaction can be fatigue-inducing after long periods.

The 6D hands paper dealt with building an interface for a 3D modeling program which relied on mapping a user’s hands in 6D space (X,Y,Z,yaw,tilt,roll) using a pair of consumer-grade web cameras. The calibration for the system seemed to be... really extensive. It required 30 minutes after the cameras were moved, plus 10 minutes more per user. They also extolled the virtues of their system’s requiring just 22,000 samples rather than the usual 30,000. I’m not an expert in this area, but that seems rather incremental. Anyway, they built an interface for moving around CAD parts using this webcam setup, and it sounded like much of what they did was build an interface which automatically snaps parts together, via either “coincident” or “concentric” mates. I didn’t come out of the paper with a good idea of how precise their system was in real use, because they seemed to assume a lot about how the users were assembling parts. I also was confused why they brought in the canonical “hooray, we can organize pictures” example at the end of the paper... I did appreciate their quest for “precise and memorable” hand gestures for interactions, and their assertion that data gloves are prohibitive for switching from gesture input to keyboard/mouse input.

The iRing paper was a little hard to follow at times, but the basic premise was to build a ring-form input device which can sense its rotational position around the finger as well as the direction and amount of pressure being put on it by the opposite hand. The argument was based on the fact that many portable input devices currently being researched are of alien shape and not conducive to the fashionable tendencies of real people, and that a ring-shaped device would be more socially acceptable. To be honest, I play with my wedding ring all the time, and playing with a ring that had a function seems reasonable to me. On the other hand, I play with my ring all the time, and it would be annoying to be accidentally hitting “reply all” and sending nonsensical text messages via my phone while I do so. The basic idea of observing skin’s change in reflectance as it stretches and bunches (and its differences around the finger based on body composition and hairiness) is a good one. It might make a cool necklace, too.

Digits, a system based on IR light emitted from a wrist-worn box, is interesting. I am uncertain that the actual system that they built is an interesting contribution. I consider their new IK models to be very interesting, and the idea of using IK rather than a nearest-neighbors lookup table with a giant database is certainly a good one. However, the fact that they do modeling on how the hand works means that this device could never be used with tangible props of any kind (a pen or something), which leaves it in the strange valley of mid-air gestures. I just don’t like ‘em! Also, using IR LED falloff as a depth cue is pretty cool, and I’m not sure it had been done before.

arie meir - 2/18/2013 11:43:48

The 6D hands presents a true 6dof input mechanism based on two cameras, which the authors target at cad-specific applications for rotating/ displacing parts in the assembly stage. The main advantage of this work is in the markerless approach : "minimal invasion" into the user's space. The questions that arose as i was reviewing this work are related to the attention question from the 5A model - how precisely can the system detect mode-switching ? What burden does the attention aspect present to the user? I think the compelling application the authors chose to focus on is well suited to this type of input device.

Another more general theme that can be seen from the 3 papers is that various devices address different needs, perhaps we are witnessing a specialization of input devices. While the mouse and the keyboard will still remain the "bread-and-butter" of the typical user, specific application needs will benefit from custom form/function input devices.

The iRing paper presents an elegant, surprisingly rich despite its underlying simplicity input device. What surprised me in this work was the lack of machine learning application - it seems that some kind of linear classification scheme would do better than the ad-hoc manipulation of equations. Computational burden is not an issue as the training can be done offline and all that the MCU needs to do is dot-products of two 4-member long vectors. Although one can extrapolate about the possible miniaturization of this technology, it would be interesting to hear the author's own ideas about this. For example - I believe the whole device, including the wireless transmitter (IR) and the MCU could be integrated in a single ring.

The DIGITS paper presents a standalone IR TX/RX based gesture recognizer, which takes the Depth-From-Shading approach for depth estimation. While the technology is interesting, I am not sure that 3D interactions with mobile devices is the killer application, as this tends to make the whole interaction setting somewhat bulky. The authors present the weight of the device in comparison with a metal wrist watch, but they exclude the battery which in their power budget will might easily double the weight. I liked the two different approaches to estimating the gesture, each with its own advantages and drawbacks. Once again the question of attention becomes important - how does the HAND know we are "talking" to it ? Any mechanism for attention will likely draw a line between accuracy and user-experience.

Overall, the take-away message for me from this weeks reading was the realization that there are many various needs and there is probably no single device to fit them all (although the good-old computer mouse was definitely trying). As new interaction modalities develop, new user input approaches will evolve to address new needs and this mutually fertile development is likely to be one of the driving forces of HCI research.

elliot nahman - 2/18/2013 15:03:02

6D Hands: 6D Hands is a hand tracking system allowing a user to exert 6 degrees of freedom through hand gestures. It is similar to other table-top systems leveraging cameras above a surface and, like them. requires a large dedicated workspace for the cameras to be mounted and suffers from non-portability. They use a two camera method to prevent occlusion.

It is able track hands without markers or gloves. The authors claim there is a seamless transition between keyboard, mouse, and 6D system, which is hard to believe that it would not pick up false positives. It seems like you would have to lift your hands some distance up off the keyboard; how would that then affect the cursor? Does it jump down to the bottom as the hands raise up into the view of the cameras? Though it is nice that they considered this integration with existing input systems

To overcome some of the limitations of not having tactile feedback nor the ability to change view while moving objects, they automate some tasks, such as mating by using snaps. This same method can be used in traditional interfaces, so how is this any better? Automating these task suffers from not being rigorous enough and taking control away from the user. In my past life, I found this a very annoying feature of SketchUp and actually prefer the SolidWorks method of mating, which the authors bash.

Also, they use a limited number of gestures and only allow one type of motion at a time. For instance, they separate translation from rotation; isn’t that one of the limitations of traditional mice they cited as being a problem? It feels like the authors created a novel system, tested it, found out they had too much freedom, and scaled it back to the point that it is equivalent to using a mouse.

iRing iRing is an interesting alternate method of capturing finger position, motions, and presses. It seems like it can infer hand motions by relying on the fact that all your digits are connected and changes to the hand influence the finger position. It contrasts 6D Hands in that it is small and portable, potentially pair-able with any device (desktop, laptop, or mobile) instead of requiring a specific workstation.

They clearly put a lot of effort and research into making it work technically, but didn't investigate the practical applications. They mention some ideas, but how well these may work is completely missing. I cant help but wonder just how much of a motion you have to make for it to really capture your movement and, say, control a cursor. How fine of a resolution do you really get? Also, switching between using it and typing seems like it might be a problem. I might also worry about false positives. Seems more like a good companion device for something like mobile. I am also guessing that this system works outside in sunlight, since the IR detectors sit next to the skin and probably don't pick up ambient IR. In this way, it contrasts other IR systems we have seen.

Digits Digits is a wearable hand tracking system that leverages IR and depth cameras. Basically, it is a portable version of 6D Hands. I find it more compelling than 6D Hands because it is portable, but probably suffers more limitations overall. Obviously, the prototype is rather large, but with time, can be miniaturized. However, it seems like some height off of the user’s wrist is necessary to get a good view of the user’s hand. I would posit, the greater the height, the better the view and therefor accuracy. So it necessitates some thickness to the device which could potentially interfere with other activities such as typing. I also wonder about things such as sleeves interfering with readings.

Like with iRing, it too might be difficult to switch between different input types and control when you are giving input vs not.

Ben Zhang, PhD - 2/19/2013 0:27:42

  • 6D Hands (Wang)

6D hands targets are fine-granularity manipulation in CAD software, aiming to reduce the amount of time people have spent in the cumbersome operation when 3d modeling. They design the system to be glove/marker-less so that the context switch overhead will be minimized. Such design principles are based on their observation of different mode of operation, including gestural manipulation and traditional menu/settings. By constraining the possible gestures, the system is able to search effectively in the pre-populated database to find the matching command. The paper also comes with some hand operation modeling to further increase the detection correctness (especially pinch/click operation).

Having the experience of dealing with 3D modeling before, I personally understand the difficulty of using traditional mouse to interact with 3D scene. And has mentioned in the paper, some commercial software provides visual cues when users are moving an object along an axis. And also due to the nature of 3D constructing, mating is frequently used to compensate for human's coarse operation. I do agree with the strong motivation behind this paper that some optimization needs to be done to increase productivity.

However, this paper comes with the notion of providing such easiness by user operating directly with their hands. I seldom feel confident when I could moving an object freely. Freedom usually just decreases accuracy. Even when mating is used, some specially-required position has to be done by numerical method. Another uncomfortable part in this paper is the shadow approach in locating users' hands. This is rather an indirect way of interaction (the paper also mentions this point in the evaluation. From my experience, I would prefer a three-view drawing based operation, complemented by a general overview, so that I can precisely pinpoint an object. Besides, as most vision-based approach is problematic at, this requires training and is subject to environment changes.

Overall, I like the separation in the paper of different tasks. And potentially such a system would increase some productivity in professional usages since it could work as a complementary interaction scheme.

  • iRing (Ogata)

iRing uses the form factor of a ring and detects human's operation by monitoring the IR reflection on fingers. And they report their approach as being high responsive, low power, and noninvasive. And I was primarily thinking of a different approach that either relies on an IMU or merely relies on the rotational energy generated by fingers which act as signals of human's operation. However, this paper doesn't mention exact evaluation of power consumption; therefore it becomes impossible to compare with a low-power IMU-interrupt based approach.

Anyway, when talking about using IR reflection, this paper first studies the characteristics of reflection signals. And the comparison results of different people reveal some common characteristics of the reflection. And the paper designs their detection algorithm primarily based on these pilot studies. However, in order to increase accuracy, calibration is still required. But this paper doesn't provide much evaluation results to give readers about their performance.

Additional critiques come with their insufficient description about system addressing (or feedback), such as how to deal with false positive. Though the entire paper focuses on the validation of IR reflection. As an HCI paper, several critical interaction questions are not neglectable.

  • Digits (Kim)

Digits system uses an IR camera to detect the reflected signals from user's fingers, aiming at providing a always-available input device. The sensing hardware is relatively simple and can be implemented using commodity hardware. While the sophisticated signal processing addresses the problem of gesture detection. The process includes background subtraction, image rectification, finger separation, laser line sensing, forward kinematics, diffuse IR fingertip detection, inverse kinematics. This paper also provides a relatively comprehensive studies and evaluation of this system.

This paper is similar to the SideMouse prototype from MSR, but Digits provides the ability of mobility using a wrist-worn approach. The hand model in such context improves the accuracy of detection given the complex poses that our hands can produce. Though limitations that most vision-based techniques still exist, this approach stands out by limiting the detection scope (around hands) and using some combination of light source (laser line and diffuse IR).

Though the wrist-worn device frees human's hands (no need for data glove), but still such additional sensing devices need to be carried with users. This ensures mobility, which is lacking in most environment-installed sensing devices. But I am still wondering how people would trade-off between the always-availble sensing and the additional object they need to carry.

Sean Chen - 2/19/2013 1:14:26


Digits is a wrist-worn sensor. It uses cameras to sample the key parts of the hand, and then uses kinematic models of the hand to recover the 3D pose of the hand.

This method works without the need for full instrumentation of the hand (e.g. a glove) or environmental sensors. It is said to fully capture the fingers movements and uses a much lower power than a depth camera. Digits reminds me of the side mouse we read the other day. They both use IR but Digits is able to use algorithm to recover the full 3D hand pose.

However, I don't really see how a data glove is not mobile enough. And some of the gloves have openings on the fingertips which allow the user to feel the texture as well. The gloves use flex sensor (similar to the lab 2 that Elliot and I did) and shouldn't cost too much either. One advantage that Digits might have over the data gloves would be the form factor. It seems that this device can be one size fits all after a small refinement.

6D Hands

This research uses two consumer-grade webcams to observe the user's hands. Then the system captures a number of hand gestures to assist the user to better control 3D CAD softwares.

Having used SolidWorks last semester, I understand how difficult and unintuitive it is to map 3D space into 2D control. The way 6D Hands handles camera translation is intuitive and effective. I think it's a good mapping of interaction method and application. The paper mentioned about marker less tracking. Although typing and using mouse while wearing a glove wouldn't be a problem for me (comparing to the method "Real-timehand- tacking with a color glove"), if my tasks are frequently interrupt and I had to take off and put on the gloves constantly, this approach would be much preferable.

Comparing to Digits, although this approach requires sensors in the environment, it's quite all right since most of us wouldn't do CAD work on the go. One question is whether the color of desk would affect the accuracy of the result.


iRing uses 4 IR reflection sensor to determine the positions of ones finger. It can then capture the rotation of the ring and the pressing of each side to perform various interactions.

It is quite novel to simply use IR reflection sensors to determine the positions. And the natural of flesh make the system able to detect change when a side is being pressed. I think this approach, if able to made to wireless, could really fulfill the always available input concept. It can be connected with the smartphone, and when the user have the phone in pocket or in a distance, a tapping could mean pickup/hangup the phone, and the rotation could control volume or rewind/fast forward the music.

Some questions come to mind are: - Since it's wired, how many degree can the ring be rotated in real use? - Is it possible to make it wireless while still small enough to perform the same tasks? - How can it tell an accidental squeeze from an intentional one?

David Burnett - 2/19/2013 1:41:53

"6D Hands: Markerless Hand-Tracking for Computer Aided Design"

This project attempts to solve the basic issue of most people who design 3D objects in computers: 3D manipulation with fundamentally 2D tools (mouse, stylus). In addition to the familiar tools already available, this project augments the CAD workstation with gesture-sensing cameras to add the remaining 4 dimensions to interaction.

The project effectively addresses the age-old issue in an unobtrusive way. No special viewer is required as with virtual reality of a decade or two ago (when interaction was still an issue too), and the equipment is straightforward enough to add to any drafting desk in a way that VR tagged gloves and IR sensors never were. So ubiquitously was the interaction solved that I stumbled on a frame kit for this kind of configuration available from a reseller:

It also addresses the problem intuitively: the solution's lookup table approach is very fast, accurate enough to be within human positioning tolerance, and utilizes natural gestures that are simple to learn: a kind of multi-touch for an invisible surface. Most importantly, it augments, rather than replaces mouse and keyboard. Many interaction designers try to supplant familiar input devices which significantly hinders user adoption.

The authors have made great attempts to reduce strain and increase comfort, ensuring wrists and arms don't fatigue excessively. It's a good start, but from my experience with input devices like these are still exhausting to use for an extended period in a way that mouse and keyboard seldom are. In addition, the most rapid system users tend to move zero or one hands off their usual control surfaces, not two. A way to provide this input without disengaging would be ideal, and possibly the only way for this to compete with established 3D workflows in use today.

Also remaining is the issue of attention: when the system is paying attention to you. The damage that can be done by accidental input here is within the range of an undo function, but still potentially frustrating. More frustrating could be the small change in centroid when un-pinching, shifting your design slightly and necessitating repeated fine-tuned adjustments. Most of the paper was spent detecting the fingertips in the first place; it may be worth looking into alternative methods of alerting the system when your hands are bound to its axes.

"iRing: Intelligent Ring Using Infrared Reflection"

With only four inward-oriented IR emitter/detector pairs evenly spaced around a ring measuring backscatter off the finger inside, this project detects eight ring rotation positions, hand clench state (yes/no) and whether one of the four emitter/detector pairs is being pressed.

The ring's usability is elegantly simple, in that it encodes the ways we naturally interact with common rings already. Instead of trying to put foreign input devices on the ring like a small finger-mounted keyboard, this project uses pre-trained twisting and pressing actions as input.

The sensing is performed in a creative way and detects several input types. As mentioned before, the encoded motions are real ring interactivity and the IR sensing enables that skillfully without friction of wheels or similar mechanical coupling. IR reflectance also requires very little software implementation and commonly-available microcontroller hardware to detect and process the signal to the point of being useful.

Four IR LEDs need a lot of power, so the potential of this ring ever cutting the tether and becoming a fully standalone ring is low. Easy power savings are possible through pulsing the LED and only pulsing one at a time to, but the and likely radio transmission to make this a complete system might exhaust available power in much less than a typicaly day-long wear period. Not to mention, fitting current battery technology in that form factor is unlikely.

On the subject of detection, the strong IR reflectance signals need to be carefully mapped to actions. Grasping, holding, and using real world objects stands to accidentally trigger the inputs in some way and those mapped actions must either be abortable or fused with another input source for confirmation. It also remains to be seen if clear detections are possible without per-user calibration, and if recalibration is needed after skin tissue changes like vasodilation in hot weather, rapid pulse, or perspiration.

"Digits: Freehand 3D Interaction"

Digits is an IR camera and laser line mounted on the underside of the user's wrist. The camera scans along the palm to image the fingers and combines that with a model of how the hand operates in reality to determine how the fingers are oriented.

This project is a new way of effectively representing hand pose inside the computer, which can be useful for 3D interactivity tasks, medical measurement of tremors, or pretending to be a Jedi. Despite their known limitations with crossed or otherwise occluded fingers, the hand poses pictured in 3D match reality with high apparent accuracy. Though grasping would remain difficult without haptic feedback, humans already use a variety of hand signals that could be encoded as input with this system (especially the range of sign language symbols).

The data necessary to calculate the authors' hand poses could be obtained with much more complicated 2D linescan systems, but the method employed here is simple, inexpensive, and uses commercially available parts. In particular, estimating distance from infrared brightness is an impressive way to do what more expensive and high-powered solutions do with a fraction of the resources.

I found the hardware to be different than expected based on promises earlier in the paper. A set of wrist-mounted equipment is only marginally better than an instrumented glove, and worse in cold weather. I personally would find a glove less obtrusive. The equipment also is claimed to be a huge improvement over expensive customized systems, but with an uncommon IR setup and line emitter, it's barely the repurposed commercial sensor the early sections makes it out to be.

Through the paper's descriptions of complex irradiation and hand kinematics models, much time is spend on accuracy but I saw little to no mention of how quickly the system performs, which leads me to believe it isn't fast enough for real time (even connected to a PC). In my own brief gesture experiments, running through those of the paper's Figure 9, the system has at most a quarter-second to detect the pose and provide feedback to the user that it was read, and this time margin includes the time required to form the pose before it becomes too slow to be intuitively usable. An interesting path of research might be to detect which pose the hand is likely heading toward, akin to branch prediction but based on inertia (and maybe muscular state data as given by the iRing project) instead of program execution.

Joey Greenspun - 2/19/2013 8:18:13

6D Hands This paper created a unique camera based input device for 3D CAD tools. The system used two camers that were off angle to map the hands into a virtual space and determine the pose of each hand. They did a great job in identifying what exactly they wanted to accomplish and not trying to make a product that could do everything at the expense of being difficult to use. They mentioned tasks such as entering numerical values, annotations, and menu navigations; these tasks would be very difficult to accomplish using their hand based gesture recognition modality, and the researchers identify this and state that using a mouse and keyboard for this is the route to go. Additionally, a good amount of research was put into determining what people actually do most often in CAD programs. Design engineers were monitored while using programs like solid works to determine what actions were most important. They determined that concentric mates between circular boundaries and coincident mates between faces were the most common. So they developed around this. I think this is a very important and worthwhile way to go about developing new input devices. They also made sure it was as intuitive as possible by allowing rotations to occur by using an imaginary piece of paper model. I think these researchers did a great job in developing this product.

iRing This paper discusses a smart ring that can be used as an input device by measuring varying reflectance values at different parts on the finger. A ring based sensor has its advantages over other types of sensors that can measure hang pose. The standard here is a data glove, but it requires a lot of expensive sensors and cannot be used when tactile feedback on the skin is important (because it covers the entire hand). This ring can identify the state of the finger amongst the following positions: straight, bent, bent backwards, and clenched. It does this by measuring the reflectivity value of an emitted IR light source. Although they claim they can determine these various positions, I wish they had done more quantitative analysis on differentiating them. They only real quantitative analysis we see is on the rotational functionality of the device. And I don’t see this as a very compelling input scheme. To rotate a ring on your finger, you’ll need to use your other hand. Additionally, we like our rings to fit snugly so that they do not fall off. Typically, people struggle to take rings off once they’re on due to this fact. So although rotating a ring on your finger to change the volume of your music player sounds cool, I don’t see it as a very practical avenue for device input. Also, they briefly mentioned a palm sensor in their introduction, but I don’t see it mentioned anywhere else. Is there an additional sensor on the palm?

Digits This paper discussed a wrist worm device that is capable of very accurately determining the pose of the hand using a camera and two IR emitters.

The main niche they are attempting to fill is the mobile space. They are attempting to make their device as small and efficient as possible. Obviously, this is just a prototype and the actual device would be much smaller and less cumbersome, however, something the researchers fail to mention until the very end of the paper, is that the processing is all done on a desktop computer. None of the processing is done onboard. This is a huge issue if they are actually looking to make this device function with your smartphone.

They also claim that their device is robust to ambient visible light. However it is using an IR camera, and as we’ve learned in the past few camera papers, the sun is an excellent IR emitter. They gloss over this fact, and chances are this device would perform much more poorly if used in bright daylight. However this is never brought up as an issue. The researchers also mention having the user interact with invisible UIs. I.e. reaching out in front of themselves and turning a nob to turn up or down the volume. This sounds interesting, however I’m not sure how practical it would be. The lack of tactile feedback is definitely an issue here. I thought it was very clever of them to incorporate biometric restraints into their algorithms. It makes so much sense to limit how the program can think about the hand pose, based on logic, but actually going through and implementing that is amazing.