Interacting with Depth Cameras

From CS260 Fall 2011
Jump to: navigation, search

Bjoern's Slides

Media:cs260-03-kinect.pdf

Extra Materials

Discussant's Materials

File:DepthCamera.pdf

Reading Responses

Amanda Ren - 9/5/2011 13:14:17

The "Combining Multiple Depth Cameras and Projectors" paper introdues the LightSpace, a small room consiting of depth camers and projectors that allows user to interact with virtual objects through more than just the interactive displays, placing them in a 3D environment.

This paper is important because it makes us realize that rather than manipulating virtual objects through the use of a mouse, we can make use of depth cameras and projectors to mainpulate those objects as if they were physical objects. Unlike other related work on interactive surfaces and spaces, LightSpace makes use of multiple depth cameras to successfully simulate a continuous interactive space. In terms of today's technologies, it is mentioned that the Microsft Kinect may start making the making the use of depth cameras more common place and that it has shown that is it possible to track users through skeletal tracking. One focused blind spot is the use of a physics engine for interactions. We can consider a user holding multiple objects or even exchanging objects between users.

The "3D tanglible Tabletop" paper introduces the Micromotorcross system, which is used to help explore future ideas for using depth cameras to build tangible tabletops.

This paper is important because it shows the possibilities of not only being able to interact with virtual objects like in the first paper, but also having those virtual objects interact with physical objects on a desktop. Micromotorcross takes advantage of the depth sensing cameras to recovers a height map and then builds a projection-vision system. With the results, it projects both a display on the table and a secondary monitor. Although laser scanners have been used to calculate depth images, today's new camera technologites offer depth information in a more direct fashion, allowing more applications like the Micromotorcross. One drawback of the application is that once again, the physics modeled is not completely accurate. Also, the pinching gesture they have implemented doesn't seem to be very intuitive.

The "Touch Sensor" paper introduces the use of a depth camera to detect touch on a tabletop.

This paper is important because by using depth cameras for detecting touch, we no longer have to limit ourselves to a flat surface and we can also detect interactions above the surface while tying each touch movement to specific users. The paper talked about the techniques used in the experimentation such as mounting two Kinect cameras at different heights and concluding that precision was halved with the doubling of distance. The test results fit with the assumption of touch being less reliable compared to direct ways of sensing touch. Although with the popularity of capacitive touch screens, the fact that we can distinguish touch interactions between users by using depth cameras. provides many possibilities for the future. Even if the one drawback mentioned was performance in comparison.


Devdatta - 9/5/2011 18:17:54

Depth cameras allow a completely new form of interaction with computers. In particulars, they offer a cheap, acceptable granularity method of creating a full 3d model of the current environment+user. This week's readings focus exclusively on the new modes of hci made possible by depth cameras.

The first paper presents how a whole 'space' can be created using multiple depth cameras and projectors: use of projectors and depth cameras means that no special equipment other than these are required. The `space' can just be the furniture of a preexisting room. The second presents using a depth camera for creating a immersive environment in which real life 3d models (e.g., an obstacle) can be included in the virtual environment. The author is one of the originators of the Microsoft Surface project, and he demonstrates that interactions just as immersive as Surface (if not more) can be created with depth cameras and projectors: possibly bringing tabletop computing to many more people. The last paper details the experience of the authors with recreating touch sensors using depth cameras and the tricks they had to use for the same. Again, this can be placed in the general thrust of democratizing tabletop computing via depth cameras.

All these papers are clearly relevant to today's technologies because of the cheap and ubiquitous availability of the kinect depth cameras and SDK. The experiences and the algorithms used by the researchers could very well be used by others.

My main concern with the papers is that they mostly discuss what they did, and not why they did it. Alternatively, they do not go into much details of the mistakes they made, and the trade-offs they had to do for (say) performance/interactivity. I would have preferred more details on these: I feel these are important for any future work in the area. For example, why were multiple depth cameras needed? Was it only for increasing the field of vision or multiple cameras also reduce random noise? If so, by how much? Whats the tradeoff? I would have loved a lot more details on the user experiences: I find it quite unbelievable that most users did not find any particular interaction unexpected/counter-intuitive. I remember my first interaction with Microsoft Kinect in which a few interactions were quite counter-intuitive to me.

On the whole, I also feel like the order of papers on the site should have been reversed: the combining depth cameras work seems to build on the other two papers and it would have been best to read it last instead of first.

Steve Rubin - 9/6/2011 13:39:15

The papers all revolve around depth cameras and their use in creating interactive rooms, tangible tabletop displays, multitouch interfaces.

Each project documented by the papers gives an application for depth cameras. While the general public associates such cameras with gaming, the papers assert that there are many other applications of the technology. LightSpace suggests a new approach to creating an interactive space. By mapping out the entire room with depth cameras, LightSpace is able to implement new techniques like picking up data and navigating menus in 3D space. In implementing these new ideas, LightSpace expands the toolbox of UI designers. The paper on creating touch-based interfaces suggests a method that, while inferior to capacitive touch screens in some ways, can incorporate new states, like hovering and grabbing. It's important that this paper mentioned these benefits, because otherwise it would seem like reinventing the wheel with inferior technology.

The LightSpace paper demonstrates a novel proof of concept, but only in the "Future Work" section of the paper does it really explain why we might want an interactive room. Technology for technology's sake is interesting (and, I gather, acceptable in HCI research), but potential uses should have been mentioned earlier in the paper.

The paper on using a depth camera to create a touch sensor mentioned that the technology could be used to identify a hovering finger. I would have liked to see the paper implement this in addition to touch sensing. Without it, the paper gives a useful algorithm for creating a multi-touch surface, but little incentive to use it over standard touch screens.

The evaluation in this trio of papers essentially boils down to, "we built it and then tried it out." The user studies are not data-centric, but instead are observational in nature. Because these are tech demos and not realistic applications, there is no way to test whether a certain interface is as good or better than some previous interface. Instead we have to trust the authors when they say that the users were "amused."


Valkyrie Savage - 9/6/2011 14:18:06

Main idea:

Depth-sensing cameras can be used for a variety of novel user interactions. In combination with projectors, they can provide touch interfaces (though they are not yet fully comparable to touch surfaces in precision) as well as interfaces that can understand real-world objects as part of the digital environment.

Reactions:

Depth cameras have a lot of potential! As they get better and cheaper (that many middle class Americans now have them sitting atop their televisions is a good indication that they are), they are able to drive some projects that were pipe dreams not so long ago. I have read a bit about the Luminous Room, and depth cameras are taking steps towards that for certain. Reading this set of papers was quite exciting, and I’m happy that I’ve got the Kinect drivers up and running on my laptop now.

The facet of the LightSpace that I found most interesting was actually the facet which most deviated from the dream of a Lumious Room: the interactions in space. Recognition that computing need not be contained in discrete places, and, in fact, that it need not be contained at all, provides myriad opportunities. With our bodies available as “displays”, why should we any longer be tied to carpal-tunnel-causing laptops? Why, especially in the indoor gyms which are so ubiquitous and suited to repurposing, should we have to attach sweat-susceptible devices to ourselves just to watch a video, or grub up magazines with our damp hands in order to read? It would be simple to apply the techniques in the LightSpace paper towards a gym without those nuisances. The possibility of such a thing would involve possibly extensive additional reasearch: although the procedure for configuring the co-located set of projectors and depth cameras was described, it might be more difficult to achieve nice cooperation between several units scattered throughout a larger area.

As to MicroMotocross and the touch sensing papers, I found the MM paper to be more engaging, although I can see that the ability to have non-flat touch input surfaces is clearly more important. My interests are more immediately engaged by novel (to me) interaction techniques than by reapplication of old techniques. The MM paper makes me imagine playing Katamari Damaci in the scene of my own office/home, which is super exciting. Play is an excellent motivator for research, and putting a fun, real-world interaction spin on the seemingly quite serious and useful depth camera is likely to get more people involved in it (Microsoft, for instance :) ).

As for criticisms, I am curious to know the sort of demo day during which the LightSpace folks “showcased” their project, mainly whether the people invited/attending were largely technical in nature or largely nontechnical in nature. Although all of us in the field of CS are thrilled by the idea of interacting with arbitrary objects in our environments, we are also familiar with the problem of information overload, and perhaps having an environment which is “on” or “always on” might be disorienting, at least during the initial adoption phase--as evinced by e.g. the cell phone, second generation users experience much less of this. It could potentially affect users who are not familiar with computing, but it could also affect users who prefer having the option to shut off their laptop, put it in a corner, and be sociable. Ultimately, I suppose, the goal is to introduce computing that is sociable, and/or invisible (see next week’s papers), and based on some of the interactions supported by the LightSpace (e.g. handshaking transferring objects from one surface to another through multiple users), it might be headed that way.

I wonder also what the goal of LightSpace is; the application that they showcased in the videos and paper seemed to be a photos and videos application that simply presented what might be called “social” data, and that seems like a very reasonable function of such a setup. I imagine that much, much more thought would have to be put into interfaces to make such a space viable for interacting with “business” data like spreadsheets. That, actually, may be a good research direction.

My criticism of the touch sensing paper is that they did not test with users wearing gloves. Now, I know it seems silly, but one of the properties of capacitive touch screens that I find to be absolutely irritating is that I have to, in the dead of winter, take off my gloves to use them. I know this is not such a big deal in temperate California, but in Indiana I can assure you that it is! If that annoyance could be removed from the interaction, I (and many others; for instance there was a story in the news a year or so ago about how South Koreans hit upon the idea of using sausages on their iPhones instead of taking their hands out of their gloves : http://www.tuaw.com/2010/02/12/frozen-sausage-as-iphone-stylus/) would be quite pleased. The usefulness of that interaction depends quite heavily upon the cold-sensitivity of the depth camera itself.


Derrick Coetzee - 9/6/2011 14:36:26

This week's readings focused on novel applications of depth cameras, devices which can determine the approximate depth of each pixel visible from the camera's point of view. There are at least two technologies for this: one which projects an infrared grid and computes depth based on the grid's distortion, and one which emits pulses of infrared light and measures return time. Both have a resolution of about 1 cm of depth for a camera at a distance of 2 meters, and exhibit significant noise. These devices have recently become inexpensive due to their production at volume for Xbox Kinect.

Both LightSpace and Micromotorcross rely on a combination of depth sensors and projectors to create interactive surfaces. Micromotorcross, the earlier effort, had only a single surface, but was already a compelling application on its own and naturally encouraged players to engage in gestures to manipulate the cars. It introduced the use of physics simulation to model interaction between the human participant's body and virtual objects, a promising technique also adopted in a simple form by LightSpace. On the other hand, in both these applications, the inability of the system to project an object into mid-air limited its ability to faithfully reproduce the internal 3D scene - for example, jumping cars would only look realistic if viewed from above.

The limited resolution and high noise of the depth camera means that it cannot reproduce the kind of high-frequency, high-precision interaction that are possible with e.g. capacitive screens, pressure-sensitive tablets, or wearable glove controllers. A useful area for future work may be hybrid systems that combine depth cameras with other technologies. Similarly, the use of a projector leads to poor visibility in well-lit environments, discoloration when projected onto a colored surface, or images obscured by shadows, problems that emissive displays do not exhibit. It would be interesting to explore the potential of "above the surface" interactions on emissive display surfaces (or surfaces projected from below).

Another limitation of the three works was that they are unable to sense depth for occluded regions, such as under the desk or if the user leans over the table, a problem that might be mitigated by not placing all depth cameras directly above the scene. In particular, in the case of the use of depth cameras as a touch sensor, knowledge of the thickness of a finger was needed due to the position of the camera; if the camera were mounted below a transparent surface, or if an additional camera were mounted at the side, this would not be necessary.

Although the physical simulation and connected component analysis in LightSpace were a simple and novel approach, these approaches can lead to counterintuitive results. For example, pressing a surface with an elbow or hip should not be interpreted as a specific command in the same way that touching it with a hand is, because the hand has a symbolic "pointer/manipulator" quality in human body language. Similarly, the multi-person interaction, where an object was transferred "through" two people shaking hands, may easily be invoked accidentally without either of them intending to do so.


Viraj Kulkarni - 9/6/2011 20:22:27

The three research papers talk about involving depth cameras in human computer interaction. The paper titled 'Combining Multiple Depth Cameras and Projectors for Interactions On, Above, and Between Surfaces', which uses elements from the other two papers as well, discusses the development of a room called 'Lightspace' which has been designed to explore a variety of interactions that are made possible by using depth cameras along with projectors. The other two papers are more specific in their scope and discuss how depth cameras sense depth and their use as touch sensors.

As I see it, Lightspace is a product of multiple distinct researches in the field of depth sensing, computer vision and HCI. The beauty of this is that it enables users to interact with machines using gestures which are natural and intuitive. In this case, the system adopts to the user's natural sense of gestures rather than it being the other way round wherein the user adopts to a standard input device and uses it as the method of interaction. Lightspace is not the first research effort in this field. However, what seperates Lightspace from its precursors is the richness of the interactions it seeks to enable.

One thing that's worth mentioning about such systems is the physical effort required by the user. For my undergraduate project, I had developed a gesture based HCI system which used pattern recognition to detect hand gestures. As a demonstration, we used our system to drive a picture slide show. The user could change pictures or zoom in a particular picture using hand gestures. It was fun for a while and, during our trials, it generated a lot of excitement amongst the users. However, the more we used it, the more we realized that although its very exciting to flip images on the screen by sweeping our hands, it became tiring after a while. The physical effort required to move one finger on the trackpad of the laptop is way more easier than sweeping your entire arm.

Gesture based HCI systems such as Lightspace are no doubt exciting and revolutionary in their own way. But they are not always necessarily 'better' as compared to a standard mouse or equivalent pointing devices. I learned that in my undergrad project and I find that very applicable to the research presented in the above papers as well.


Alex Chung - 9/6/2011 20:44:36

LightSpace:

Using three depth-cameras for detection (input) and three protectors for display (output), LightSpace can turn a typical small room into an interactive environment by allowing users to interact with the projected images like multi-touch surfaces or real-life 3-dimensional object. This research explores the possibility of computer vision and new human computer interactive experiences.

The goal of LightSpace is making computers to adapt to our world rather than changing our behavior to use the computer. While touch sensitive interactive display is limited to a flat 2D surface, controls-through-vision allows user interfaces that are not flat or flexible. Basically, LightSpace creates a virtual 3-D copy of a small room while tracks and emulates all live movement within the computer program. It presents a new direction of human computer interaction by focusing on our surrounding environment rather than the shape and sizes of the input devices.

Instead of telling the computer what to do, computer vision allows user to show the computer like one would interact with another person. While this paper focuses on projected interactive surfaces as visual cues to the user, computer output can be expanded beyond image projection. For example, Honda’s humanoid robot ASIMO uses depth cameras for spatial recognition, movement tracking, and object recognition.

While watching instructional videos on YouTube is better than reading instructions, it does not compare to the effective of learning the technique in person. The added spatial dimension gives better understanding of interactions between objects. Similarly, the mechanism of using standard image processing to analyze users’ 3-D actions may not produce a system with high precision. Furthermore, the standard image processing techniques to track users and their interactions become the bottleneck of future development.

While the researchers have done a herculean task of devising algorithms for 3-D to 2-D conversion and imaging processing analysis, the research does not create a model for reaching the full potential of captured 3-D data. The latency and sensitivity issues could be resolved with better computers and cameras. However, I’m afraid the range of possible gestures for 3-D interactions would be limited by the analytical techniques.

Depth-Sensing Video Camera for 3D Tangible Tabletop Interaction:

This research uses depth-camera technology to extrapolate the 3D configuration of objects on a table surface into a virtual model that allows interactions. The system is interesting because objects can be introduced or removed and the system will dynamically reflect the changes in the virtual model.

How well does the system adept to the changing light conditions? Since the camera uses IR pulses to detect the distance between the calibrated reference point and the physical model on the table, it should adopt to variable lighting conditions. It has a much faster acquisition time than the previous design (Piper, 2002) using laser scanners. The added speed allows the current design to provide a greater range of interactions with users.

This system reminds me of a technology in the molecular and cellular biology (MCB) lab called Nanodrop. It combs through the protein’s surface with a nano-fiber and uses the displacement of the connecting end to draw the topographic map of a small compound. The resolution is defined by the size of the nano-fiber. Depth sensing technology could replace Nanodrop if it could use a narrower wavelength and be able to apply to microscopic object. Probably use a fixed intermediate lens to refract the returning light along with algorithm to interpolate the data.

Using a Depth Camera as a Touch Sensor:

Employing the depth camera to emulate touch sensitive display has the following advantages: 1) it can be implemented on any surface; 2) the surface need not be flat; 3) the camera can sense user’s movement before it touches the surface (hover state).

Using a depth sensing camera overhead, the challenge is determining whether the user is touch the surface or not. The researcher overcomes this issue by using an empty surface for calibration and set the thresholds to define the hover state and touching state. Similar to the paper on 3-D tangible interactive table, this system has proven to be dynamic and sensitive enough for real life implementation. However, both studies require the table surfaces to be fixed in place after the calibration step. For example, the current system has to re-run the surface calibration after placing a book on the table.

The most interesting idea from this paper is identifying the owner of each touch by using the depth sensitive camera to observe the connection between the fingers and individual users. It opens up the possibility of interactive games with multiple players. Another exciting idea is the experiment using a sphere as the touch surface. Unlike the current computer input devices such as mouse and touchscreen, it allows user to interact with the virtual model with a 3-D input device.



Hanzhong (Ayden) Ye - 9/6/2011 21:38:16

Reading response for pre-class reading on Topics on Interacting with Depth Cameras

These three papers talk individually about the design and implementation process of interactive spaces (LightSpace), the interaction with tangible tabletop and the new way of touch sensing, all using depth camera technology as a novel approach of implementation, which triggers my strong interest in depth camera technology and its potential application.

The LightSpace experiment proposes several novel interaction approaches such as simulated interactive surfaces, through-body transition between surfaces, picking up objects, spatial menus, etc. Some of these ideas are based on earlier related work, while others are very original. Many practical aspects in terms of implementation approaches are discussed, such as virtual camera, calibration for cameras and projectors, reasoning strategy for spatial interaction and connectivity, etc. The article gives me a good insight of the implementation of immersive environment and augmented reality.

The second paper and the video showcase a very interesting application of depth camera in term of tangible tabletop manipulation. The experiment is fascinating in that it clearly shows the impact of real world objects upon virtual world objects. The user experience described also shows the great potential this technology bears in aspect of electronic entertainment and virtual/augmented reality.

The last paper discusses a very practical aspect of depth camera’s application, to implement touch sensor. Although not so sensitive and accurate as traditional capacitive touch sensing screen, such touch sensor implemented through depth camera has obvious advantage in that it does not need screen installation, and it enables touch sensing on non-flat surface and can work in concert with “above the surface” interaction techniques. This gives us endless possibilities to create new way to interact with interactive surfaces.

All these three innovative research give me a wonderful glimpse of the unique technological advantage and the endless potential of application which can be derived from current depth camera technology. I am very eager to learn more about its features and future applications in the class, as well as to use available APIs on Kinect to develop exciting interaction applications.

Ayden (Sep 6, 2011)


Laura Devendorf - 9/6/2011 22:09:49

Each paper discussed and innovative interaction method implemented through the use of one or more depth-sensing cameras.

An interesting aspect shared by all of these projects was the ability to transform an everyday object into a "digital" object through the combined use of depth sensing cameras and projectors. In doing so, this research makes creative use of the cameras and suggests interaction styles that are conducive to a 3D interaction space as well as a possible method for implementing touch sensing through a depth-sensing camera rather than a touch sensors. This research provides a substantial contribution to human computer interaction as it provides a range of applications for depth sensing cameras. This is especially relevant since depth-sensing cameras could be a relatively low cost alternative for creating robust digital spaces.

I'm tempted to criticize the selected implementations chosen within the papers, such as the gestures they chose to map to actions or the fact that the testing spaces were highly constructed (one table, low lights, no disturbances). However, I feel as though these criticisms are not well placed. As I read them, the papers are showing what can be done more than what should be done. In this way, they provide a base for future research into gestural interaction as well as continued research into the capabilities of depth-sensing cameras.



Yun Jin - 9/6/2011 22:37:52

Combing multiple depth cameras and projectors for interactions on, above, and between surfaces

The paper presents a lightspace prototype which combines depth cameras and projectors to provide interactivity on, above and between surfaces in our environments. And it talks about the details of interactions and algorithms unique to LightSpace and discusses some initial observations of use and suggestion of future directions. The invention of LightSpace has a lot of contributions towards Human-Computer Interaction. First, it contributes the novel combination of multiple depth cameras and projectors to imbue standard non-augmented walls and tables with interactivity. Also, using this technology, we can track users and their interactions by reducing 3D space to 2D projections. Finally, it can facilitate transitioning of content between interactive surfaces by either simultaneous touch or by picking up and interacting with a virtual object in hand and in mid-air. However, it has some blind spots of the technology of LightSpace. For instance, LightSpace is limited to emulating interactive display features on flat, static shapes that are designated beforehand. And an interaction would fail to be detected because another user or the user’s head occluded the cameras’ view of the hands or body. Despite of the disadvantages of the LightSpace, we could still envision its development in the area of Human- Computer Interactions. For example, we could envision it LightSpace as a useful platform for exploring the physical relationship between the user and environment. And I think the depth cameras have the potential to move the interactions from our computer screens into the space around us.


Depth-Sensing Video Cameras for 3D Tangible Tabletop Interaction

In this paper, it presents an interactive tabletop system which uses a depth-sensing video cameral to build a virtual simulation game on the table surface. And it mainly talks about the technology of micromotorcross and predict of its further development on new tabletop interactions. In the investigation of micromotorcross, the author uses ZSense camera. To compare with laser scanner and correlation-based stereo, ZSense camera is better to be used for 3D tangible tabletop interaction cause it times the return of pulsed infrared light and it also includes a separate depth image. However, the ZSense depth image is somewhat noisy. For some applications it will be necessary to smooth the image to mitigate the effects of shot noise. And this disadvantage would cause delay from when an object on the surface is moved, to when this change is reflected in the modeled terrain. Thus, I think maybe some other cameras could replace it for better applications for 3D tabletop interactions. To explore the applications of depth-sensing cameras to interactive tabletops, the author uses the camera to recover the height map of the table surface and the objects placed on it. Using this technology, for instance, players could drive a virtual car over real objects placed on the table. And I think this technology has great vision. For example, we may pick up a virtual object with the grasping gesture and place it on a physical object sitting on the table.


Using a Depth Camera as a Touch Sensor

In the paper, it talks about a new technology of using a depth camera as a touch sensor. And it demonstrates how a depth-sensing camera may be used to detect touch on an un-instrumented surface. To compare with traditional techniques, such as capacity sensors, the use of depth cameras to sense touch has the following advantages: 1 The interactive surface need not be instrumented. 2 The interactive surface need not be flat. 3 Information about the shape of the users and users’ arms and hands above the surface may be exploited in useful ways, such as determining hover state, or that multiple touches are from the same hand or user. However, the approach proposed in the paper has some limitations. First, the surface calibration cannot be updated appropriately to detect touching the objects. Second, the calculation of contact position is not so accurate by exploiting shape or posture information available in the depth camera. Finally, a simple histogram may not be precise for us to value the depth at each pixel location.



Hong Wu - 9/6/2011 23:56:04

Main point:

All of the three papers described the principle and implementation of using depth camera and some projectors.

Interpretation:

The three papers did research on a new method to interact with program. The method didn’t require people to wear equipment so that it will make people more comfortable and natural. “Using a Depth Camera as a Touch Sensor” proposed a way to simulate touch screen without instrumented equipment. “Depth-Sensing Video Cameras for 3D Tangible Tabletop Interaction” and “Combining Multiple Depth Cameras” applied depth camera for a new interacting environment.

The technique still has several drawbacks. The resolution of current depth camera is low. The frequency of the depth camera cannot keep up with fast movement. The depth camera needs a still background such as desk and wall. The interactive space is limited and takes a lot of time to build. The interactive object can be blocked when more people involve. The mesh processing is too complicated to support more than three people.

When the technique becomes mature, it may change our way to communicate with computer. We may get rid of the keyboard or mouse and just wave our hand. It also widens my vision. In my project, I may use depth camera for a fancy demo.



Ali Sinan Koksal - 9/6/2011 23:58:46

In these papers, interaction techniques that make use of depth cameras and projectors are presented. These include i) combining multiple depth cameras in a room with the ultimate goal of transforming each surface into an interactive display and also making the space between them active; ii) allowing for three dimensional interactions with physical objects placed on tabletops; iii) achieving accurate touch sensing on un-instrumented surfaces.

Depth cameras, in addition to providing color information, give information about the distance to the displayed object at each pixel location in an image. This technology is used for conveniently tracking objects in space and therefore exploring new ways of interaction. It can be used as a touch sensor on un-instrumented surfaces, by tracking objects at a particular distance from the surface, such that the system is not affected by noise in depth data, while preserving temporal and positional accuracy. Another compelling use of depth cameras is in combination with projectors, and consists in projecting virtual objects on tabletops while taking into account actual physical objects on the surface. The concept is illustrated with a racing game where virtual cars projected on the surface go over physical objects on the table.

Making a leap, LightSpace is an installation that is not limited to one interactive display, but combines multiple depth cameras and projectors to push forward the technique to ultimately be able to use all physical surfaces in a room as interactive displays. The 3D data collected from all cameras is transformed into a number of 2D images from the point of view of "virtual cameras", which allows the use of existing 2D image processing techniques for implementing interactions. These interactions include use of multiple interactive surfaces, and the use of space between them to transfer objects from one surface to another. The ability to use un-instrumented surfaces as interactive displays in coordination seems very promising in conceiving interaction systems of the future.

It would have been interesting to see more usage scenarios of these techniques and an investigation on their practicality. Descriptive models for these new modes of interaction could help on structuring future directions to explore more precisely. A more in-depth evaluation of LightSpace with concrete tasks could help us move forward in this goal.


Apoorva Sachdev - 9/7/2011 0:06:27

Reading Response 2: This week’s readings were about depth cameras and how this technology can be used in various ways. Authors Andrew Wilson and Hrvoje Benko create a novel installation called LightSpace, which uses depth cameras and projectors to allow multiple kinds of interaction with surfaces in the paper “Combining multiple depth cameras and projectors interaction on, above and between surfaces”. In “Using a depth camera as a touch Sensor”, Wilson describes the technique which could be used to identify touch using a depth camera and thereby allow non-instrumented surfaces to be used as touch surfaces. In the third paper “Depth sensing video cameras for 3D Tabletop interaction”, Wilson creates a virtual 3d game called “Micromotorcross” which allows users to modify things in the virtual world using physical objects i.e. use paper or their own hand to create an obstacle course for the virtual car.

LightSpace is a room installation that allows users to interact between various surfaces and use the whole 3d space to interact with intuitive gestures like sweeping, stretching rather than a surface like a table top. With the advent of Xbox Kinect, it’s much easier to get detailed 3d depth information about a particular space and hence we can use to interact with surfaces using our hands rather than input devices like keyboards/touchscreens. However, our computation power and algorithms are still restricting and require us to convert 3d images into 2d to enable fast image processing. This might eventually limit the complexity of the gestures we can track. The paper aims to make “the room a computer” but I feel that this particular implementation would be difficult to scale up as it would require many of such scattered set-ups to work together and a lot more computation power (if one was moving a “digital piece” in our hand from one corner of the room to another). Also, I found the implementation of the spatial menu a little counter-intuitive because in most cases a menu is used to display all the options together rather than show options one at a time, thus this particular implementation may not be useful.

The game application of 3d cameras i.e. Micromotorcross not only focused on humans interacting with virtual objects using physical things but also on described how virtual objects interact with physical objects in the virtual work. It highlighted the advantages of using depth camera in comparison to some of the older techniques like laser scanners, infrared cameras which are much slower in their scan rate and don’t give easily accessible information. This paper also utilized physics model for the cars which makes the interaction more intuitive to the users except the fact that the cars can’t just be picked up and placed in another location even though in the physical world that’s what happens.

The use of depth camera for touch-sensing opens a whole array of possibilities. I felt that the author could have also described some applications where tying the touch to a person would be useful. Also, an implementation of hover detection could have been described. Although the touch sensing is less accurate using a depth camera, I feel the advantages of converting any surface into a touch-table outweigh the accuracy aspect. This implementation could be used in larger office-like environments for collaboration purposes where precision is less important and coarse actions matter more. Overall I felt that all the three papers tied the depth camera to some sort of projector, so we could think about what other interfaces the depth camera could be used for?.


Yin-Chia Yeh - 9/7/2011 0:47:08

The main ideas of this week’s papers are all about how to leverage the depth camera technology to create new method of human computer interaction. The Lightspace paper suggests that with depth camera interactions are no longer limited to occur on interactive surface, but can occur above surface or even between surfaces. The Micromotocross paper uses depth camera to capture 3D model of real world and presents it on a virtual world; a projector is also used to project the virtual car to the real world so users can perform some interactions with the virtual car in real world. The touch sensor paper shows how to use depth camera to simulate the behavior of touch sensor and compares the properties of real touch sensor and depth camera version touch sensor. The best thing about depth camera is it enables a cheap and reliable way to construct 3D model of the scene, which is a critical problem to create more sophisticated interactions such as gesture recognition. While in Lightspace paper the authors use the clever virtual camera techniques to create some interactions, these interactions do not seem to be very intuitive or comfortable. For example, rather than touching the table and wall simultaneously, I would like sit in a chair, touch one surface and use my finger to point at the target surface. However, this interaction, though not implemented, is possible to be created with the support of depth camera. Another possible future work comes into my mind is that it seems these works all requires some calibrations. It would be nice if we can improve the calibration procedure to allow it to run in real time given the fact that the environment setting of people’s home or office usually is not static. For example, in the Lightspace paper, can I use some gesture to create a new interactive interface runtime? Moreover, can the system automatically change the interactive surface when certain changes in environment are detected? While depth camera can be a powerful weapon, it still subjects to some limitations, especially the occlusion. Therefore how to place the depth camera becomes an interesting topic. In these papers, they were all mounted above the environment. In that case, human body can easily occlude our hands. It would be best if the depth camera is mounted right on or behind the interactive surface, but this setting will pose a limitation on feasible interfaces. Another limitation of depth camera is its precision, mostly temporal. 30 frames per second might not be good enough for some rapid occurred interactions such as hitting keyboard.


Suryaveer Singh Lodha - 9/7/2011 1:57:01

Combining multiple depth cameras and projectors for interaction on, above and between surfaces - This paper presents the design process and an overview of the implementation process to create LighSpace, a smart room which lets people interact with computer in an intuitive way. Towards the end they also discuss observations from the tests performed and suggest further improvements based on the feedback received.

Depth Cameras and projectors are integral parts of LightSpace. Depth cameras enable inexpensive real time 3D modeling of surface geometry which makes traditional computer vision problems easier to handle. Cameras and projectors are calibrated to a single coordinate system in real world units which allows creating interactive experiences without regards to which camera and projector is used for that interaction. Using this technique it is possible to make any surface interactive. LightSpace also allows for through body transitions between surfaces (by treating body itself as an interactive surface), allows the user to pick and drop virtual objects and also allows has spatial menus. LightSpace avoids human tracking problems (which are common to other smart room solutions) by using simple and robust 2D image processing techniques to query 3D space and by performing operations on 3D mesh directly. LightSpace also makes use of multiple virtual cameras to detect user action. As each virtual camera can incorporate depth data from multiple depth cameras, multiple virtual cameras can be computed, each tailored to support a particular interaction. LightSpace uses 3 orthographic virtual cameras – one for the entire room, one each for the simulated surfaces – table and wall.

This system was tested with 800 people and it performed well for up to 6 users in the room. Also presence of 2-3 users in the room brought down the refresh rate below 30Hz (the refresh rate of depth camera). Also sometimes the user interaction would fail if the interaction was occluded from the camera (ex: user’s head comes in way and camera cannot record the hand movement) There were also issues when user tried to do very quick hand movements. Apart from these issues the user experience was very good and they didn’t need any special practice to get used to this system. One thing that stands out is ease with which LightSpace allows user to interact with computing devices intuitively, as opposed to traditional methods where either the user has to wear some marker/ tracking gear.

Short papers on depth-sensing: The depth sensing video camera technology provides precise per pixel range data which can help in robotic and vision based HCI scenarios. Using depth sensing a height map of the objects on the table surface can be constructed which will help in simulation of virtual objects on real entities kept on the table. One example is Micromotorcross, where a virtual car is driven over a ramp placed on the table. One of the drawbacks is that of the mesh changes rapidly, collision response will not be accurate. Example, if the user moves the ramp on the table suddenly, the car might even penetrate the 3D mesh of the ramp. The other paper talks about use of depth sensing cameras to detect touch on a table top (un-instrumented). This approach is interesting as it allows touch sensing on non-flat surfaces and also has potential to support above the surface interaction (hovering).


Donghyuk Jung - 9/7/2011 2:36:16

The 2002 science fiction film ‘Minority Report’ featured numerous fictional future technologies and some of them are already implemented such as retina recognition and multi-touch interfaces. As far as I know, Microsoft released the Kinect motion sensing camera add-on for their Xbox360 gaming console in 2010 from this inspiration. All three-research papers deeply correlate with core technologies of Kinect.

  • Combining Multiple Depth Cameras and Projectors for Interactions On, Above, and Between Surfaces
  • Using a Depth Camera as a Touch Sensor

In this paper, Wilson and Benko introduced LightSpace system, which “enables interactivity and visualizations throughout our everyday environment, without augmenting users and other objects in the room with sensors or markers.” They installed multiple projectors and depth cameras on the top of an office-sized room and enabled users to three major interactions (connectivity, picking up an object, spatial menu) with LightSpace. Selective projection of the depth camera data enables emulation of interactive displays on a standard table or office desk, as well as facilitates mid-air interactions between and around these displays. . Unlike previous researches about virtual and augmented reality, their work showed “novel combination of multiple depth cameras and projectors to imbue standard non-augmented walls and tables with interactivity.” They didn’t use hard tracking mechanism in order to interact with user’s movement. However, LightSpace have a few limitations. According to the feedbacks from demo events, over six users are maximum capacity for LightSpace due to physical limitation for capturing. LightSpace also cannot detect some of user’s movement when they move very fast or they block cameras’ view with their hands or body. Although LightSpace will overcome these minor defects soon, I think that there is no feedback mechanism without vision. Users might need another feedback signal from LightSpace to interact more efficiently with the system.

  • Depth-Sensing Video Cameras for 3D Tangible Tabletop Interaction

Micromotocross game is interactive surface interfaces to showcase the capabilities of time-of-flight depth cameras. It supports interactive modification of the terrain in a car driving simulation. With this interactive tabletop system, we might increase performance of collaboration between users such as physics or math classes (education) and city or urban planning. If users can use this height map at working or studying environments, tangible tabletop interaction can make users fill fun to work just like play simulation game.


Galen Panger - 9/7/2011 5:54:48

The kinds of worlds envisioned (and partially realized) by Wilson’s LightSpace and Micromotocross applications are awe-inspiring. The use of depth cameras and projectors to map responsive virtual worlds onto physical terrain creates engaging, immersive experiences that don’t require unwieldy headgear, that take advantage of any number of available surfaces (whether or not they are flat), that don’t require precise positioning of cameras and projectors, and that even allow for interaction in the air (as in holding a ball in LightSpace, or in generating hover-over behaviors as described in the “Using a Depth Camera as a Touch Sensor” article). These techniques are simultaneously freeing, and thrifty.

As a brief side note, I think the idea of detecting touch by noting the change in fingernail color is brilliant. Could this be a cheap and simple way to increase the precision of depth camera-based touch models?

But this camera-and-projector model of virtual work- and play- spaces seems, in my mind, quickly confining. It is interesting to consider that all three of Wilson’s papers focused on projecting a perceived world back upon itself, whereas the Kinect use-case is to project the perceived world onto a television screen. Projectors seem crude. They’re low-resolution, they require dark rooms, and most detrimentally, they require users to stay out of the projected area, lest they occlude the cameras and projectors.

And LightSpace, while relatively light-weight, is still a non-mobile solution. I think the most important vision embodied in LightSpace is the idea that you should be able to richly take advantage of the displays around you or in your possession, and that those displays should be positionally aware of one another and seamlessly interoperable. That vision is a mobile one that shouldn’t require dedicated rooms, or dedicated cameras or projectors.

I’m thinking of the movie Avatar, where the scientists could swipe an object from one display (say, a tablet) to another display (say, wall-mounted) at will, so that work could move to whatever available display was most convenient, or both displays could be used at once, each aware of what the other was doing and where it was located.

Maybe as a cool next step toward this Avatar world, Wilson should develop a LightSpace model that lets users swipe work from their iPhones and iPads onto a projected surface, and from a projected surface back onto their iPads or iPhones.


peggychi - 9/7/2011 5:55:04

This week we examined three papers that applied one or more depth cameras to tabletop environments including a physical simulation about real objects [Wilson 2007], as a touch sensor on non-flat surfaces [Wilson 2010], and in a room context (called LightSpace) with multiple surfaces and users [Wilson and Benko 2010]. The authors demonstrated how a height map of objects, information about the shape of users and their arms/hands, and several gesture interactions manipulating virtual objects could be realized by modeling the depth camera data.

Hand-free interactions deployed in everyday living space have been a longtime dream, especially since Mark Weiser first introduced the concept of ubiquitous computing in 1991. Unlike holding artificial objects such as mice or keyboards we discussed last week, or attaching additional wearable sensors such as gloves (e.g. the Minority Report-like g-speak spatial space), colored nails, or muscle sensors, various sensing technologies including motion sensors and capacity touch screen have gradually made this vision possible. In recent years, research on depth cameras is quickly and profoundly accelerating. Microsoft's Kinect kit has even broadly spread into the world with reasonable price and an open API so that many applications and games can be realized.

I admired the efforts put in this technology and love the idea of using cameras to free our human bodies as a multidimensional input. However, the limitations are not clearly stated in these papers. Although the limit of six users was found by an informal experiment at a demo event of LightSpace, for example, it is hard for developers to infer possible technical constrains of such technology before putting into practice. To avoid trial-and-error method of applying depth cameras, more thorough models or experiments should be presented.


Sally Ahn - 9/7/2011 7:54:36

These three papers describe using depth cameras to enable users to manipulate virtual objects by interacting with surfaces and space of their environment. "Depth-Sensing Video Cameras for 3D Tangible Tabletop Interaction" paper precedes the others and introduces a fun car driving game as an applicable instance for tabletop interactions; "Using a Depth Camera as a Touch Sensor" explains the camera-based touch sensing technique in detail, and Wilson and Benko's paper presents a detailed description and analysis of such interactions in LightSpace, a room-like environment.

Wilson and Benko state one of their primary goals as designing an interactive environment that does not require users to wear special sensors or markers. I think this is a key advantage that depth-sensing cameras offer. Although the idea for an interactive environment is not new, LightSpace presents a possible solution for implementing direct-manipulation interactive techniques that realizes the vision of ubiquitous computing. Moreover, the addition of projectors to depth cameras enable any surface to become interactive--truly, another leap forward for ubiquitous computing. One limitation to this is that LightSpace only supports flat, rectangular surfaces for interaction, and the authors mark this as future work. The authors envision an extension of this work to be a room in which all surfaces become interactive. While this is an appealing vision, the calibration requirements of using depth cameras would limit such rooms to be simple and minimally furnished.

Additionally, LightSpace also demonstrates new interactive techniques that utilizes the space between surfaces as well as the participants' own bodies. Unique interaction techniques like their "through-body" transitions seem intuitive and useful, but I can also see accidental transitions occurring in real-world settings. Their spatial menu is another novel interactive technique. One improvement for it might be to make it accessible from anywhere, not just a fixed location in the room.

Regarding the technical aspects, the addition of virtual cameras and 2D image processing for detecting user actions seems like a practical solution to recognizing connectivity between separate surfaces. One disadvantage this approach would be the need to determine ideal camera position and calibration for each additional surface in the room. This also reveals a more general drawback; the interactive room would be difficult to modify (e.g. add or move furniture) after the initial camera installation and calibrations have taken place. I am curious about the resolution of these 2D images and whether it is high enough to allow for more fine-grained interactions by incorporating multi-touch gestures (as described in Wilson's "Using a Depth Camera as a Touch Sensor") for surfaces in LightSpace.


Cheng Lu - 9/7/2011 7:54:54

The first paper present LightSpace, an interactive system that allows users to interact on, above and between interactive surfaces in a room-sized environment, instrumented with multiple depth cameras and projectors. LightSpace cameras and projectors are calibrated to 3D real world coordinates, allowing for projection of graphics correctly onto any surface visible by both camera and projector. Their work presents the novel combination of multiple depth cameras and projectors to interact with standard non-augmented walls and tables. The paper has also demonstrate the mechanism for reasoning about this 3D space by reducing it to 2D projections, which the conventional standard image processing techniques can be used to track users and their interactions. Finally, they present several interaction techniques that facilitate transitioning of content between interactive surfaces. Users can either simultaneous touch or by picking up and interacting with a virtual object in hand or in mid-air. LightSpace offers an exploration at the variety of rich spatial interactions enabled by the depth cameras. Their work shows that depth cameras have the potential to move the interactions from our computer screens into the space around us.

The second paper provided us a great application of depth-sensing video camera, which is both innovative and fun. With the help of depth-sensing camera technologies, such kind of interactive table systems are possible to implement. The traditional interactive tabletop systems are mostly based on capacitive sensing, RFID, active infrared emitting devices, generic computer vision-based object recognition, or visual barcode recognition. However, the biggest problem of these techniques is that interactions with them on interactive surfaces are typically 2D in nature. Depth-sensing camera systems refer to camera systems which recover depth information throughout the captured scene, a fully detailed range image permits a great deal of flexibility. Comparing to the traditional Laser scanners or Correlation-based stereo techniques, which all have some problems such as the speed and high cost, depth-sensing cameras is somewhat a evolutionary product that solve those at the same time. The actual testing environment of this interactive table system is based upon the micromotocross, which I personally think is a great way to show all the fantastic features of this system. They calculated the 3D position indicated at each pixel in the depth image, and it is similarly easy to texture map this mesh with the color image also returned by the camera. What’s more, the mixture of spatial and temporal smoothing techniques has to be added to this system in order to smooth the noise. The existence of the two ways project-vision system let the users to better judge the situation by their preferences. The most interesting part of this system is that people can interact with the virtual cars directly with their hands, which suggests that users would be able to use a gesture-based interface to manipulate virtual objects, just as they would manipulate real objects on the table. It is the depth image that make certain gestures potentially easier to recognize.

The third paper demonstrates how a depth-sensing camera may be used to detect touch on an un-instrumented surface. While the performance of this approach is less than that of more conventional touch screen technologies, the performance is good enough to be useful in a variety of applications. Additionally, the approach offers certain interesting advantages, such as working on non-flat surfaces and in concert with “above the surface” interaction techniques. The goal of the project is to deduce a useful touch signal when the camera is mounted well above a surface such as a desk or table. In comparison with more traditional techniques, such as capacitive sensors, the use of depth cameras to sense touch has many advantages. The interactive surface need not be instrumented or flat. Information about the shape of the users and users’ arms and hands above the surface may be exploited in useful ways, such as determining hover state, or that multiple touches are from the same hand or user.


Vinson Chuong - 9/7/2011 8:14:52

Wilson's papers offer compelling demonstrations of how the three-dimensional sensing capabilities provided by depth cameras can be used to refine and enrich interactions traditionally facilitated by two-dimensional sensors. He goes on to envision the capturing of interactions between a user's entire body and his environment.

The input devices commonly used today (keyboards, mice, tablets, touchscreens, etc.) typically sense input in only up to two dimensions and from only very specific parts of a user's body (the hands or fingers). Depth cameras can address both of these limitations by taking three-dimensional input over a large space.

Wilson papers focus primarily on interactions which are facilitated solely by depth cameras. Relative to other methods and technologies for facilitating the same interactions, he demonstrates many advantages of depth cameras. Instrumenting a space requires only that depth cameras be mounted above that space, out of the way of users. There is no equipment to be worn and individual objects need not be instrumented for their interactions to be detected. Interactions can take place anywhere within the space and need not be restricted to specific objects or surfaces. Wilson concedes that the resolution of today's depth cameras limit their responsiveness and sensitivity and that the power of today's computers limit the complexity of the inferences which can be made. So, at the moment, for very specific forms of interaction, depth cameras may not be able to compare to input devices designed specifically to handle them.

However, as mentioned in passing several times in Wilson's papers, depth cameras are very useful for providing context. An important point that I think deserves more discussion is that depth cameras can augment other input devices by providing additional information about the user, such as location, orientation, posture, and focus of attention. For example, when a user is interacting with two or more touchscreen surfaces, a depth camera can identify that a user is interacting with multiple surfaces simultaneously or that a user is facing and focusing on (without having to be touching) a specific surface. Although systems which depend solely on depth cameras may not be as sensitive or responsive as more specialized input devices, when combined with those specialized input devices, depth cameras can have a different, arguably more impactful role.

With its clear uses both in solely facilitating interactive systems and augmenting existing input devices, depth cameras seem very promising indeed. I am eager to see what will come out of depth camera technology.


Shiry Ginosar - 9/7/2011 8:23:13

This set of papers present several steps towards the goal of creating a fully interactive sensed environment that is not confined to physical flat surfaces and that specifically uses the whole space as well as the user's own body as sensed surfaces. While this vision existed for a while, previous implementations of similar systems were forced to use more complex technology in order to enable the envisioned interactions. The advantage of using depth cameras for this purpose is the ease of deriving signal from the entire space without need for interfacing with other technologies.

The interaction model described in the papers is very compelling especially when coupled with a complementing display mechanism such as an overhead projector. Although some difficulties are presented such as the inability to display an object in mid air as it falls from a table to the ground, these are mostly issues of representation and feedback to the user which will probably be solved in innovative ways in the future. However, what was not clear from all three papers is how well does the actual sensing work and whether the interaction experience is natural and flawless for the users. One of the papers claims that "while the performance of this approach is less than that of more conventional touch screen technologies, we believe the performance is good enough to be useful in a variety of applications." But the papers never elaborates on the definition of "good enough" or offer a quantitative comparison with other touch screen and smart room technologies, so that it is hard for the reader to envision the ease of use of the presented system.

Fundamentally though, this is a very exciting and simple approach to ubiquitous computing. Hopefully it will prove to be precise enough and simple enough in order to penetrate the market.


Shiry Ginosar - 9/7/2011 8:23:52

This set of papers present several steps towards the goal of creating a fully interactive sensed environment that is not confined to physical flat surfaces and that specifically uses the whole space as well as the user's own body as sensed surfaces. While this vision existed for a while, previous implementations of similar systems were forced to use more complex technology in order to enable the envisioned interactions. The advantage of using depth cameras for this purpose is the ease of deriving signal from the entire space without need for interfacing with other technologies.

The interaction model described in the papers is very compelling especially when coupled with a complementing display mechanism such as an overhead projector. Although some difficulties are presented such as the inability to display an object in mid air as it falls from a table to the ground, these are mostly issues of representation and feedback to the user which will probably be solved in innovative ways in the future. However, what was not clear from all three papers is how well does the actual sensing work and whether the interaction experience is natural and flawless for the users. One of the papers claims that "while the performance of this approach is less than that of more conventional touch screen technologies, we believe the performance is good enough to be useful in a variety of applications." But the papers never elaborates on the definition of "good enough" or offer a quantitative comparison with other touch screen and smart room technologies, so that it is hard for the reader to envision the ease of use of the presented system.

Fundamentally though, this is a very exciting and simple approach to ubiquitous computing. Hopefully it will prove to be precise enough and simple enough in order to penetrate the market.


Manas Mittal - 9/7/2011 8:43:41

These paper illustrate the techniques and mechanisms that enables applications of depth cameras, and dwell into specific applications. They present a system and (simplified) models that enable others to easily deal with the complexity of the 3D world.

These papers fall in the present new approach/present new system camp to utilizing depth cameras. The recent proliferation of inexpensive mass market (Kinect) depth cameras indicates that its a good time to develop new, interesting applications. In this work, the authors use PrimeSense depth cameras (1cm resolution at 2m distance) which use structured lighting for model construction.

The three cameras are used to cover a 10’ X 8’ X 9’ volume (and the three cameras are used primarily for this purpose, and not for any special form of triangulation. Interactive surfaces have to be manually designated (mark three corners). The virtual camera approach transforms 3D motion to 2D motions in a particular ‘virtual’ plane, thus enabling the use of 2D software. The authors also develop a mapping between the image space and physical world (picking up, connectivity, etc). The evaluation on these papers is rather binary (users had no trouble) with some qualitative observations injected. This is great.

The micromotocross (MMC) is an fun application that demonstrates whats possible by using depth cameras + RGB images (30Hz RBGZ images). Also, it is interesting to think of MMC as a game design platform rather than a game playing platform (synthetic view, vs table projection).

The touch sensor papers uses the inexpensive Kinect. The paper presents algorithms to discriminate fine grained touching. They model touch by using a ‘shell’, defined by 2 thresholds - d(max) and d(min), where d(min) is closer to the surface. Objects closer than d(min) are assumed to be touching the surface. The authors enunciate the techniques for determining these two thresholds.


New things to check-out: Vertex Buffer, XNA game development platform, ZSense cameras, Newton Physics Library,.


Rohan Nagesh - 9/7/2011 8:57:39

The main point of these three papers in aggregate is to explore the variety of new, rich interaction techniques on uninstrumented surfaces made possible by depth sensing cameras. The first paper ("Combining Multiple Depth Cameras and Projectors for Interactions On, Above, and Between Surfaces") discusses the LightSpace project, an initiative at Microsoft Research to allow users to interact on, above, and between surfaces such as a regular flat desk in a room environment. The second paper ("Depth-Sensing Video Cameras for 3D Tangible Tabletop Interaction") discusses adjacent research at Microsoft to allow for users to physically interact with a normal tabletop and manipulate in 3D various geometries to influence game play in a racing car simulation game. The final paper ("Using a Depth Camera as a Touch Sensor) discusses using Microsoft Kinect technology to aid in detecting an user's touch on uninstrumented tabletops.

I view these papers more as means of communicating the rich and vast potential of depth sensing camera interaction techniques. While Microsoft's Kinect technology has been a major breakthrough in video gaming, these papers showcase the potential of utilizing that technology for more productive purposes! I do believe the world is going mobile and touch screen interactions are definitely a big part of latest smartphone devices. It would be interesting to imagine a fully "connected home" in the future in which similar technology as discussed in the papers could be extended to enable users to interact with virtually all devices and surfaces in their homes. These technologies also have huge potential in areas such as e-government, e-learning, and tele-health, all of which are multi-billion dollar industries.

As each of the papers discussed mostly the implementation and motivation behind their respective systems in extending Kinect-like technology, the assumptions and technical details at a high level have been evaluated and validated by Kinect's success. I would find it interesting in the case of something like LightSpace to actually assess the consumer demand and viability for such an application. It's terrific to build such great technologies, but if users cannot wrap their minds around them or are simply less productive with these technologies, it will be difficult to plan the go-to-market strategy. In particular with LightSpace, I wonder why an enterprise team wouldn't just rather use coordinated iPad 2's or similar touch screen devices for touch-based manipulation. The authors have however demoed the tool to a group of almost 1200 users, so I am pleased they are incorporating user feedback into their design process at a relatively early stage.

Overall, I believe these three new interaction techniques open the doors for a variety of richer, more clever adaptations of Kinect technology. As I mentioned earlier, I am more excited by applications to more productive uses than gaming, such as e-learning or tele-health. I would be interested in exploring further the "weird factor" of having lights and projections on your hands and other body parts in the case of LightSpace and whether I could actually be productive in that kind of environment. I would be interested in applications of the Micromotocross game extending into everyday life user productivity. Lastly, with the touch sensing, I'd be interested to see if it's possible to integrate the touch sensing capability into the smart-room proposed by LightSpace to greatly boost workplace productivity and collaboration. I'd also be curious to learn the current total monetary cost in setting up such a "Smart Room"!

Jason Toy - 9/7/2011 8:58:33

Combining Multiple Depth Cameras and Projectors for Interactions On, Above, and Between Surfaces Depth-Sensing Video Cameras for 3D Tangible Tabletop Interaction Using a Depth Camera as a Touch Sensor

Summary:

The three papers we read are on various applications for depth sensing cameras in Human-Computer Interaction.

"Combining Multiple Depth Cameras and Projectors" is about using depth sensing cameras to allow users to use objects in the room as interactive surfaces. The paper details a variety of gestures used to display and manipulate virtual objects, the calibration and equipment requirements for the LightSpace installation, and the creators' observations during trial use. "Depth-Sensing Video Cameras" combines the use of depth-sensing video with a physics library to allow a user to play a racing simulation game on a course they build on a table. "Using a Depth Camera as a Touch Sensor" describes the techniques used to build an interactive touch panel with a depth camera on any flat surface.

"Combining Multiple Depth Cameras and Projectors" presents an interesting new approach to creating a smart room: a room that combines interactive displays with augmented reality. By using depth sensing cameras, there is no requirement for the user to wear extra gear, nor is their a requirement for design of everyday objects to accommodate a virtual system like Fails and Olsen suggest. Unlike many previous light projection based systems, LightSpace depends on user hand gestures rather than mouse-based interactions to move data around the room. This paper and the LightSpace technique further our knowledge of HCI and relate to the previous papers "Motor Behavior Models for Human-Computer Interaction" and "The Bubble Cursor" in the way human input is considered. The previous papers used models to analyze our current methods of input, using both keyboard and mouse, and come up with suggestions to improve on both techniques. However, LightSpace takes a different approach with touch based interactions. One avenue of research is the integration of LightSpace or augmented reality systems with a physics engine. Given the ability to interact with physical objects, like tables, to do computing, it might make sense that the data and objects we work with also would act like physical objects as well.

"Depth-Sensing Video Cameras" shows a new approach and different use of the depth-sensing camera. Instead of just allowing users to use touch to manipulate virtual data on a physical object, the racing simulation actually takes into account the physical state of objects on the table to build the virtual track: allowing the user to actually interact with 3d objects. Like motion capture suits or the Kinect, this technique allows us to bring real objects into a virtual world. I could see a lot of research being done on using depth sensing cameras to do mapping of physical objects. "Using a Depth Camera as a Touch Sensor" is a different approach to interactive touch screens. It relates to current methods in that, like new devices such as the ipad, it is entirely touch based. However, its pros over current methods is that it does not require a possibly prohibitively expensive capacitive screen to allow interaction, and can work with any flat surface.

Pros of depth-sensing devices for integration with physical objects are its use in areas like 3rd world countries. While the LightSpace might be prohibitively expensive for this, the third article brings up the use of the Kinetic as a touch screen device. The Kinetic is $150 and far cheaper than most lcd monitors or displays. Another pro of the Kinetic device is that it is an easily obtainable technology that we have now, and thus can easily modify towards the use of building the touch sensors described in the third paper.

However, all three papers and descriptions of depth sensors come with several problems. For example, each one of them requires elaborate setup and possibly expensive equipment, especially in the case of the LightSpace. It is noted in the paper that the space or room cannot dynamically change, forcing people to redo the setup if the room changes, making the system impractical. One question I have especially is the focus in the LightSpace paper on being able to display data on non-flat surfaces. The purpose of displaying data on say a bookshelf as described by the paper is beyond me: wouldn't that make it hard to read the information? Even in terms of the examples given, the only physical objects used are tables and walls. Lastly, there is a question of table or space constraints. You need empty tables and open space which means this technique seems impractical in an office, where space is constrained.