Generating and Recognizing UIs

From CS260 Fall 2011
Jump to: navigation, search

Bjoern's Slides

Media:cs260-19-generate-recognize.pdf

Extra Materials

Discussant's Materials

Discussion Slideset

Reading Responses

Valkyrie Savage - 10/29/2011 13:47:24

Main idea:

All users are different, and it’s important to consider different ways in which they might like to access information on or simply interact with their computers.

Reactions:

The Sikuli system sounds like it would be useful as a personal tool. However, as they mentioned at the end of the paper, it seems like there would be no way to transfer scripts between users or make them useful across e.g. background changes. Also, I am curious how they expect users to know what it means that something is similar(.7)? As a generally technical- and Python-minded person, I find that reasonably intuitive, but my mom for instance wouldn’t know how to set that. It’s a shame they didn’t do any testing for this interface which feels to me to be rather programmer-centric. For that matter, I don’t know if I would know how to estimate it. It would be nice if there were some way to make a guide for users when they are setting the crop size of their screenshots: someone who accidentally included too large a swath of background would be unable to run scripts. I do approve of their use of Python for the “scripting” language.  :) It seems to me to be the most intuitive language for non-programmers, which I suppose has probably been supported by some studies that I’m too lazy to find just now. I think that they problem they are trying to solve is important: another issue where computers have traditionally done things differently from the way we do them in the real world due to constraints on memory and processor speed.

There were a number of things I liked and found amusing about the Kyzysztof et al. paper. Solving the interface problem for the motor-inhibited is a noble pursuit, and one which I am unfamiliar with the works on. My grandmother suffered from MS, and she never was very good at using computers, partially due to that (she was a bit of a technophobe, also...). The study that they designed seemed reasonable, and I was intrigued by the fact that they elected to also recruit some able-bodied participants. This of course makes sense, and contributed the interesting result that average users prefer pretty interfaces over fast ones, which I don’t feel was discussed or emphasized adequately in the paper. Also, I would like to remark that I am always at least a little bit happy when results come out where a computer is better than the people at generating something-or-other: the fact that participants in the study seemed to be rather faster at and rather less tired by using the ability-generated interface than the preference-generated one therefore made me a little bit happy. I wonder if the results of this paper would have been any different if it had come after this UIST’s calibration games paper? That is, I wonder if turning the ability-based calibration process into a game rather than the systematic thing it was would have affected user motor performance. I gathered from their data that most of the motor-inhibited users were already highly motivated to interact with computers, so maybe that would not have changed things, but it would be an interesting follow-up.


Hanzhong (Ayden) Ye - 11/6/2011 15:55:33

The materials for today’s class talk about the process of generating and recognizing UIs. The first paper by researchers at University of Washington evaluates two different systems for automatically generating personalized interfaces adapted to the individual motor capabilities of users with motor impairments. While SUPPLE adapts to users’ capabilities indirectly by using ARNAULD preference elicitation engine to model a user’s preference, SUPPLE++ uses a different approach by modeling a user’s motor abilities from a set of one-time motor preference tests. The authors conclude from field studies that SUPPLE++ provides a better UI generation process than SUPPLE does, and indicate that software can adapt itself better to the capabilities of its users than we have expected.

The second paper presented a system call Sikuli, which provides a visual approach to search and automation of graphical user interfaces using screenshots. It enables users to cast a query by taking a screenshot of a GUI element, and it also allows users to write macro scripts using visual scripting APIs for automating GUI interactions. I really like the idea of this paper, especially the idea of writing visual scripts to automate GUI interaction. The examples provided are very interesting and there seem to be a lot of other applications that can be developed using these visual scripting APIs. For example, we can write scripts to automate the process of room monitoring (to detect intruders, etc.); we can also write scripts to automatically record the object flow in a given scenario (for example, if we combine the visual detection approach described here with related algorithms, we can then calculate the visitors flow of a building, or the traffic flow at a given crossroad, etc.). I also believe for the work-flow automation for office work there are even more possibilities for us to explore, since much office work refers to mechanical repeated action that can be easily automated using this approach.

-Ayden


Yun Jin - 11/8/2011 13:43:02

The first paper evaluated two systems which automatically generate personalized interfaces given a model of the user. SUPPLE++ uses a model of the user’s motor capabilities, which is constructed by its Ability Modeler from a set of one-time motor performance tests. SUPPLE, on the other hand, uses a model of the user’s preferences, which is built by the ARNAULD elicitation system. And the result shows that participants with motor impairments were significantly faster, made fewer errors, and strongly preferred automatically-generated personalized interfaces over the baselines. And the results demonstrate that automatic generation of ability-based interfaces is feasible, and that the resulting interfaces improve both performance and satisfaction of users with motor impairments. The second paper presents a visual approach called Sikuli, which is used to search and automation of graphical user interfaces using screenshots. Sikuli allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help sys- tem using the screenshot. Sikuli also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. Searching by screenshot is easy to learn and faster to specify than keywords, so Sikuli is more convenient and faster than searching by keywords. And visual scripting can improve interactive help systems previously proposed in the literature. However, this approach also has some limitations. First, sharing scripts across different themes may be difficult when users prefer a personalized ap- pearance theme with different colors, fonts, and desktop backgrounds. And Sikuli Script operates only in the visible screen space and thus is not applicable to invisible GUI elements, such as those hidden underneath other windows, in another tab, or scrolled out of view.


Steve Rubin - 11/8/2011 16:14:12

The two papers for today's class centered around building new user interfaces for specific use cases. The first paper described an experiment that tested different methods of creating user interfaces for motor-impaired users. The second paper presented two things: a visual search tool for getting help in applications, and a visual scripting language.

My main concern with the first paper is more a concern about the state of user interfaces at the moment. Able-bodied users seem almost equally concerned with ease-of-use and aesthetics of user interfaces. Companies like Apple go out of their way to design visually pleasing GUIs. This paper is basically saying, "Hey: stop that." I'm exaggerating, of course, but I think the main point is that there is a group of users that really do not benefit from "pretty" GUIs, and they need to be accommodated. I would conjecture that those users mostly don't care about the aesthetics because it's of secondary concern to them being able to physically excel at using the system. It would be cool if we could find ways to harness the automatically tuned interfaces in visually pleasing ways--then companies could design accessible interfaces that would not necessarily be ugly. Accessibility does not have to come at the cost of "brand" or aesthetics.

The second paper had some seriously interesting ideas, but I was a bit perplexed by a few of their decisions. For the visual search experiment, they are comparing their visual search with a rudimentary text search. They then draw their conclusions based on this, suggesting that their visual search is easier to use/learn and overall, at least as good as the text search. I don't buy it. The authors should have been comparing their visual search against Google's text search. I doubt that their numbers would have favored the visual search if they had used a highly refined text search algorithms. In their visual search, they also require that users do screen captures of the desired element. It might be easier (and more streamlined) for users to just click on the element that they want to search---the system could actually just highlight every UI element that was in the database. The physical act of doing the screen capture seems unnecessary.

Both of these papers brought up interesting ideas about UI design, and it made me realize (even more so than before) how there may not be "right answers" to many of the questions we ask in HCI.


Laura Devendorf - 11/8/2011 16:39:17

Gajos et al compare two methods for augmenting user interfaces in order to support users with varying mechanical abilities. Sikuli discribes how screen shots can be integrated into code to be used for search and automation.

I found Gajos' paper for be very interesting and I particularly and it introduced me to new concepts I wasn't previously familiar. I would be interested to see how the interface would change as the input methods change. What would be the optimal way to reconfigure a form for a touch screen and not a mouse? I would have liked to have seen more qualitative feed back about the two methods in order to assess the roles of test-generated and personal preference based designs were interpreted to the users. Is is that the computer was able to uncover something the users didn't know they needed? I think it would have also been interesting to note the sorts of tasks that need to be performed by this group and how they currently go about accomplishing these tasks on a computer - are these users even using the mouse or are they taking other approaches, such "put that there."

Skilculi seems interesting but I'm having a hard time deciding where and when it could be used. They give the example of searching for the Mona Lisa. Wouldn't it be easier to type Mona Lisa. If you need a photo of the Mona Lisa, aren't you going to end up on a page with info about the Mona Lisa? I guess I could see the search function being useful for unknown unlabeled items. Some of these concerns relate to technical barriers but the concept is strong. A careful integration of screenshot and text elements is key. Undo is also a key feature if the algorithm accidentally put everything that looks like a file in the trash. I believe that the limits of the pixel level recognition need to be clearly understood by the user and I'm unsure how one would communicate this. I would have also liked to seen feedback from the study participants on whether or not they think they would realistically use it since its outside of habit.


Shiry Ginosar - 11/8/2011 16:40:07

These two papers talk about user interfaces - how to automatically generate and recognize them. Gajos et al look at adapting UIs to various conditions of motor-impaired individuals. They take note of the fact that the high variance across conditions makes it hard to sufficiently tweak UIs per person by hand and an that an automated way of generating appropriate flavors per user is preferable and feasible. Sikuli presents an automated way to recognize UIs based on screenshots and then use these screenshots as queries in a visual query engine for UIs that may be used to retrieve tutorial information for these UIs.

I found the Gajos et al paper pretty inspiring as their system of inferring user abilities seems to work pretty well. I wonder how their method would be extended to include touch based interfaces. As a whole the inference system they use seems like it would extend naturally to this domain but a more thorough investigation may be needed for this to be determined. Additionally, I am interested in seeing how this would extend to other types of impairments. Could UIs be auto generated for various types of visual impairment as well? As for the performance results presented in this paper, it was interesting to see that for able users navigation time was lowest with the baseline interface. Perhaps this is because these users are used to using the baseline interface in their every day lives? This is one case where a long term study may have been more appropriate.

The Sikuli paper provides a way to search for GUI screenshots as well as use them as part of a script. I was surprised by the evaluation procedure of Sikuli's search results as compared with a text based search engine. For Sikuli the authors used the visual cues, surrounding text and embedded text. For the textual search engine only the surrounding text was used. This resulted in the text search engine not having access to the words used in the GUI itself. This is surprising as it does not seem to be a fair comparison. One would have thought that using the embedded text as well as the surrounding text for the textual search engine case would have resolved some of this difference.


Cheng Lu - 11/8/2011 19:33:50

The first paper, “Improving the Performance of Motor-Impaired Users with Automatically-Generated, Ability-Based Interfaces”, evaluated two systems for automatically generating personalized interfaces adapted to the individual motor capabilities of users with motor impairments. The first system, SUPPLE, adapts to users’ capabilities indirectly by first using the ARNAULD preference elicitation engine to model a user’s preferences regarding how he or she likes the interfaces to be created. The second system, SUPPLE++, models a user’s motor abilities directly from a set of one-time motor performance tests. In a study comparing these approaches to baseline interfaces, participants with motor impairments were 26.4% faster using ability-based user interfaces generated by SUPPLE++. They also made 73% fewer errors, strongly preferred those interfaces to the manufacturers’ defaults, and found them more efficient, easier to use, and much less physically tiring. These findings indicate that rather than requiring some users with motor impairments to adapt themselves to software using separate assistive technologies, software can now adapt itself to the capabilities of its users.

The second paper, “Using GUI Screenshots for Search and Automation”, presented Sikuli, a visual approach to search and automation of graphical user interfaces using screenshots. Sikuli allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of the element’s name. Sikuli also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. The paper reported a web-based user study showing that searching by screenshot is easy to learn and faster to specify than keywords. They also demonstrate several automation tasks suitable for visual scripting, such as map navigation and bus tracking, and show how visual scripting can improve interactive help systems previously proposed in the literature.


Hong Wu - 11/8/2011 21:00:27

Main Idea:

The two papers were about how different GUI design affects the performance of users.

Details:

“Improving the performance” compared the two systems with different strategies to customize GUI for users. One system is based on users’ answer whether they like the system or not. The other relies on the learning result from users directly. By comparison, the author claimed that the system with learning ability is better in a lot of aspects.

“Sikuli” proposed a new search tool whose input can be an image rather than a text. The paper pointed out that the search by using an image directly is more natural than transfer the content into text first.

In my points of view, the learning abilities of software are very important for software but not the most important. Usability is still the core of software while customizability is just a cool feature. Searching an image directly is very helpful and will be the future if false positive is lower.


Derrick Coetzee - 11/8/2011 21:15:17

Gajos et al's 2008 work "Improving the performance of motor-impaired users with automatically-generated, ability-based interfaces" describes an evaluation of a system that generates UIs suited for users with a motor disability by choosing and laying out controls subject to preference data or motor models. It provides a compelling way to increase efficiency and decrease frustration of motor-impaired users with little resource investment from designers. In addition, it is a compelling example of universal design, significantly benefitting able-bodied users as well.

This work focused on evaluation while referencing prior work for the implementation - while this is useful for learning about evalation, a more complete understanding of the system could be derived from a longer integrated report.

Some questions I wondered while reading this:

  • If the space constraints were increased to the screen size, would productivity benefit?
  • Based on the results, are there effective measures UI designers can take to benefit motor-impaired users without dynamic adaptation?

The use of the animated rectangle to point out the next UI element may have impacted results in unexpected ways: for example, smaller UI elements may be harder to spot than larger ones, or more subtly, if users normally pause while visually searching for the next element, this could give a brief rest that mitigates motor fatigue.

With any within-subjects evaluation learning effects are a concern; in this case with only a single learning task, and with the learning task having different instructions, this risk is increased.

Considering the enormous diversity of motor-impaired users and the small sample, it's possible people with a particular impairment were unrepresented.

A significant limitation of the study was that it merely arranged existing UI controls in a standard dialog box and used only the standard mouse cursor. This was highlighted by user MI02, who indicated he primarily used the keyboard. Substantially more efficiency might be achieved with custom controls, custom applications with different high-level workflows, custom mouse pointers, and/or a combination of custom input devices.

Yeh et al's 2009 work, Sikuli, does search and automation of GUI interfaces by recognizing a control based on its graphical appearance.

One thing that perplexed me about this approach is that, at least for simple UI automation, the windowing system makes available all the information needed to determine directly exactly what control is at a location, without having to guess based on its appearance (see e.g. WinSpy). Other platforms have similar functionality. So the main technological contribution seems both less robust and more complex than alternative methods, reverse engineering information that is already easily available. A combination of these two techniques could be both more robust and more flexible than either alone.

Moreover, evalution results for search were poor, showing no significant difference in several important metrics like relevance.

One aspect of this work I found compelling was the use of a scripting language syntax incorporating arbitrary graphical elements. This substitution of pictures for names could easily find application in programming any application, and would simplify accessibility across languages.


Galen Panger - 11/8/2011 22:11:22

It’s not clear why the ability-based interface and GUI screenshot search articles were packaged together for today’s class; one, to me, is very much about accessibility and the other is about visual search. The common link is that they both refer to GUIs, but that’s not much of a link.

That said, the ability-based interface piece was, I thought, an interesting attempt to improve the performance of GUIs for people with motor impairment. Results suggest that performance would improve for everyone regardless of disability if we customized interface elements to each person, so it might be something to invest in for a general audience, as well (though, it would stink when you tried to use someone else’s computer). Actually, given the relatively high level of standardization of GUIs, this test could be done once, and the results replicated across all applications that supported the feature/API on a given platform. One problem would be that the optimal interface for a particular user might change over time as their usage improves or changes; but perhaps the system could adapt slowly to the users’ changing performance or simply re-test the user at some set interval (though changing an interface after a user has habituated might be more costly than beneficial... that could be tested).

One thing I would have appreciated seeing the piece address is how variant the optimal interfaces were; were there “types”? (Hard to do with so few subjects, of course, but still.) Did the able-bodied interfaces all look similar? The results of the ability-based optimizations could actually tell us something about interface design principles in general. Just a hunch.

Finally, the GUI screenshot search article was intriguing; I especially liked the scripting examples, such as tracking bus movement and baby monitoring. I thought their performance evaluation was shoddy (the baseline keyword search of the 500 dialog box screenshots seemed doomed to fail; they didn’t do anything creative to optimize that search process, capping the number of search terms at 10). Overall, however, there were a lot of interesting ideas here.


Alex Chung - 11/8/2011 22:38:49

Gajors’ research on user’s ability-based interfaces has illustrated the bias toward regular users in user-interface design. By simply tailoring the UI layout to the targeted users, their efficiency and satisfaction increased significantly. The greatest advantage of computer is its inherent malleability, then why should the UI of software be the same for everybody?

While the later experiment has shown that their performance would have a better improvement than the preference-based approach, a major of disabled people favors the preference selection over the automatic approach in the first test. It is interesting because the second experiment showed the ability-based interface performed better than the preference-based interface in three out of four categories. However, they did see ability elicitation as much more physically demanding than the other method. Hence user experience is an important factor in usability study and efficiency should not be the only goal.

Another interesting remark was how the disabled people preferred the trackball mouse and keyboard. They are both stationary and user can focus on one action at a time. This reminds me of the conversation I had with the founders of iTeleport mobile app on the iPhone and iPad. It is a VNC solution for mobile device users to remote desktop to their desktop at home or office. They were surprise to receive stories from disabled people about how their mobile app improved their way of interacting with their personal computers.

Finally, I wonder how much money tech companies spent on user experience study with disabled people. For example, Apple products are sold all over the world with people from diverse culture and experience. Yet the user interface is the same for everyone. Is the product so simple that it conforms to natural interaction? Or the user is willing to adapt to the user interface because the product is so compelling?


The jBrick framework provides a system I/O toolkits to allow gadgets from different platforms to play nice together. It addresses the pain point of HCI where the design is compromised because the technical implementation is too difficult and buggy.

This is huge win for the HCI community because it makes prototyping easier. Thus UI designers can focus more on design rather than hacking the hardware. I will certain try to learn jBricks Input Server (jBIS) in near future.

As the paper stated, it is an engineering study focused on combining advanced input and output devices. It illustrates the large bandwidth to provide a seamless experience and how latency ruins user experience.

Yet I wonder if jBricks can keep up with the ever changing operating system on the input devices that were mentioned in the paper. It would be nice if this becomes a successful open source project.


Viraj Kulkarni - 11/8/2011 23:11:05

Improving the Performance of Motor-Impaired Users with Automatically-Generated, Ability-Based Interfaces' proposes and evaluates two systems for automatically generating personalized user interfaces which are customized for individual users taking into account their motor control abilities. The paper also includes a comprehensive evaluation of the two systems. The SUPPLE system combined with ARNAULD engine deals with customizing the user interface based on the users preferences which are elicited through surveys. The SUPPLE++ system, on the other hands, adapts itself to the user's capabilities based on a one time ability assessment test. One interesting thing that comes to my mind is whether we could take these ideas and use them for user interfaces for able bodied people as well. Like, for example, Gmail might measure parameters on how an individual interacts with the system and customize the interface for the individual.

The paper on Sikuli uses GUI screenshots for dual goals. The first is to use screenshots for search instead of keywords. I am not convinced that this would perform any better than a text based search. In fact, it might perform worse in several situations because a screenshot search would miss a lot of context. Honestly, I don't really see a lot of value in it. The second part deals with automating task using scripts that manipulate screenshots. This is really exciting. You wouldn't use such scripts in production code, but I am sure they might be incredibly useful to automate mundane tasks. I am sure BPOs, call centers and workplaces like that where employees do a lot of routine work might benefit a lot from using Sikuli. It might be especially useful in testing environments. Yes, it does have a limitation that it cannot work with windows that are occluded by other windows, but I am sure creative programmers can easily find their way around that.


Ali Sinan Koksal - 11/8/2011 23:59:56

The paper by Gajos et al. is a work on adapting user interfaces to user preferences and motor capabilities. The adaptation may occur either by the specification of a 'preference model', or a training phase where the system learns about motor capabilities of the user in order to customize the interface such that the user performance is improved over baseline interfaces.

The Sikuli paper investigates on the search of documentation and creation of scripts for visual interfaces by considering visual elements (in the form of screenshots). Python scripts integrating screenshot portions can be used to achieve a number of automated tasks using graphical user interfaces, such as manipulating files, responding to message boxes, or even monitoring a baby by checking periodically whether a certain pattern is visible on the screen.

The adaptive interfaces of SUPPLE++ based on individual motor capabilities of a user have great value in increasing accessibility to user interfaces. The evaluation shows that the system is indeed effective in improving user performance.

It is important to be able to easily automate repetitive tasks that take a user's time. Sikuli can help in moving closer to this goal, but a programming by demonstration (PBD) system which can infer such scripts using visual elements without needing to write a Python script from scratch could be an interesting goal to explore. Sikuli offers a rich language for scripting tasks, resulting in considerable expressive power, but restricting this language and automating the creation of such scripts (as done in a system like EAGER) could ease the adoption of such a system by a large number of users.


Amanda Ren - 11/9/2011 0:26:00

The Gajos paper aims to evaluate two systems the effectiveness of two systems for generating personalized interfaces based on motor capacities of users.

This paper is important because it aims to show that user interfaces can successfully adapt to the user's capabilities rather than forcing the user to adapt to the interface. This relates to today's technology because given that most of the work people do now is via the computer, we need a way for users with motor impairments to just as easily use a computer. The experimenters used to types of systems to generate these new interfaces. One generated interfaces using the user's preferences while the other generated interfaces based on the abilities of the user. In all cases, it resulted that the total time for the ability-based system was the lowest and baseline the largest. It was interesting how initially users would choose preference based over ability based, but later they chose ability based to be more efficient over preference. I was actually surprised that total time was still shorter for ability generated interfaces over baseline (for the able-bodied participants) given that the authors used common applications they were already familiar with. These results are promising given that the authors were able to find their system to improve speed and decrease error rate, and more importantly satisfaction from the users.

The Yeh paper presents a system where a user can query for help using a screenshot of a gui rather than manually searching.

This paper is important because it aims to make searching for help easier by incorporating how humans normally communicate, by using visual references. Their system Skuli allows users to search through a collection of online documents by taking a screenshot of their GUI. I thought it was interesting how they applied their visual approach to composing scripts. Their system also appears helpful because it can be used with other systems to create tutorials. The Skuli scripting system looks very useful and relevant because you can easily write scripts to do many diverse tasks on the desktop.


Manas Mittal - 11/9/2011 0:35:45

The first paper, about evaluating Supple and Arnauld. Although this is a user study paper, it is still interesting since both the Supple and Arnauld system are interesting. I find the usage of Likert scale in Supple++ as highly suspect - users are being asked to judge a metric they are not commonly asked to evaluate.

On a related side note, It is interesting to think what customization in interfaces would do for the web. With regards to Arnauld - I am inclined to think if we can build a web 'Arnauld' cookie that sits on a users computer and profiles their actions, thus characterizing their preference model. This cookie can then be used the user experiences on other websites.

The Sikuli system is interesting in that it uses visual search/visual parameters as a feature to understand/identify screen elements. The authors also provide a Visual Scripting API. I found it really interesting that the authors used the SIFT algorithm for feature detection, and that it worked this well. Upon reading this paper, I downloaded Sikuli, and tried it, and I wasn't able to get the browser to go to www.facebook.com (I spent about 5-7 minutes). Still, very interesting based on the techniques used (I will try and play with the SIFT algorithm a little more).


Peggy Chi - 11/9/2011 2:00:26

How do we go beyond static graphical UI? This week the papers we read demonstrate great examples that jump out of the box. Gajos et al. designed a system, SUPPLE, that adapts to users' physical motor abilities by changing the interfaces to rearrange the components dynamically. Sikuli by Yeh et al. is a novel approach that performs search and automation based on GUI in a "visual" way.

I really like the idea of adaptive interface in SUPPLE. Why do users always need to follow the UI designers' rule? Most of the current applications only provide limited options of customization, not to mention the interface layout. However, from the paper and demo video I wonder if such approach can be applied to any interface. For example, in some scenarios the interface items may be correlated or in a certain visual flow that should not be rearranged, such as a dialog box with parameters or a form. How much could their system capture?

The sikuli paper has opened many opportunities for UI research. I especially like the scripting idea that allows end users to define their own rules. My concerns would be: this heavily relies on users' visual memories and identification. I often find myself getting lost in the crazy application icon list, unable to target the one among those similar features. For another example, I use text search for apps in iPhone much more often than swiping through the pages. Is it really easy to formalize our visual thinking? I don't agree visual approach is the ultimate solution, but any combination would be really interesting, as shown in the paper.


Suryaveer Singh Lodha - 11/9/2011 4:10:21

Improving the Performance of Motor-Impaired Users with Automatically-Generated, Ability-Based Interfaces:

For most parts of it, I was actually impressed by this paper. The fact that we all try to customize our desktop environments in one way or another whenever we have to work on a new system for a long while (weeks-months) proves that the problem does persist. For example, I for one always want my vim to be set up in a prticular way when I begin to write code. I have my own color coding and keyboard shortcuts for vim which make sense for me (maybe not for others) and helps me improve productivity. The fact that this paper tries to bring the same level of personalization to Graphical User Interfaces is interesting for me. Also the fact they compared the ability based against preference based interfaces was great to read. I think the technology is pretty neat and while it does help motor-impaired users, it helps able bodied users as well. However, there were a few things which bothered me about this paper. One, I think mostly a user would modify (and re-modify) his interface based on the kind of work he does with it. For this time is a major factor. It would have been great to see how users personalize and evolve their interfaces as they spend more time working with the interface. If the study was conducted for over a week per user, I think the results might have been more informative. For example, I can recall customizing my vim editor atleast 4-5 times over a few weeks to achieve a customization which best for me. Also, I think the scope of the paper was limited in the sense that it focussed mostly on motor-impaired users. It would have been interesting if they had equal number of able-bodied users and compared the amount of improvement between able-bodied and motor-impaired users. Also, I'm curious what happens when a user has to shuffle between multiple workstations? Is it as easy as just copying a customization file from one system to another, or does a user have to go through the process of setting it all up from scratch again?


Sikuli: Using GUI Screenshots for Search and Automation

The idea to incorporate image search in the help menu's seems intriguing. This might help in making a transition to a new software simpler. However, there is an assumption that the user will know for sure which GUI icon to query for. This assumption seems logical, moreso as we see better and self explainatory tool icons in new software releases. I liked the 3-gram approach for text based searching. This approach might have limitations, but when clubbed together with image search, I think the results will be great. One limitation which apparent is how to deal with different themes of UI components, which is rightly pointed out by authors. I don't think forcing users to go back to a default theme for the purpose of searching more information about an icon is worthwhile. However, probably some smarts can be made in the search algorithms which can detect a range of themes. This approach of taking a screenshot to search about functions sounds interesting for sure. One more limitation I see is that there is only 1 shape - rectangle. Why can't we have different irregular shapes as well? One area this approach can be useful is touch screen surfaces, where user can define a ROI (Region Od Interest) by just doodling with a pen/fingers and search for more information based on selected region.


Apoorva Sachdev - 11/9/2011 8:01:48

Today’s readings were on improving the performance of Motor-Impaired Users with Automatically Generated, Ability-Based Interfaces and Sikuli which is a GUI screenshot based program for search and automation.

The first paper we read was very informative and I liked the way they conducted the study, however, I felt that it was more about the comparison of the techniques than focused on actually improving their performance. I think it would be very helpful if more applications supported this interface modification to suit the users as it would make the life of Motor-impaired users much better. I also thought that the authors repeated their points a lot between each section (restating the same statements).

The second paper we read on Sikuli was very interesting. I think the ideas presented about enabling search using pictures and the abstraction of using pictures for easy scripting is very powerful. Initially, when they presented the idea of using Sikuli for searching keywords in a program, I was a little skeptical since most programs now have a tool-tip option that provides the correct name of the function/button and after that a simple Google search with that keyword can provide you with multitude of answers, so I wasn’t sure why the users would take the trouble of taking a screenshot and using that to search for relevant results. But the other applications they presented seem very useful like map-tracking, searching, responding to GUI messages automatically etc. Although, it seems obviously useful, it would be good to perform a few user-studies to test the system and see if it could be improved. Overall, I think if the authors of the paper could make the system more robust to work with varying themes and deal with multiple windows/tabs it would be a great system and I would like to use it!


Allie - 11/9/2011 8:23:31

In "Improving the Performance of Motor-Impaired Users with Automatically-Generated, Ability-Based Interfaces", Gajos, Wobbrock, and Weld speculate that instead of requiring users with motor impairements to adapt to software using separate assistive technologies, software can adapt to the capabilities of its users.

It is estimated only 60% of users who need assistive technologies actually use them. Supple, adapts to user's capabilities by using Arnauld preference elicitation engine to model user's preferences. Supple++ relies on built-in Ability Modeler to model a user's motor abilities through a set of one-time performance tests.

Supple performs decision-theoreti optimization, searching for the interface with the lowest estimated cost while satisfying all the device constraints. Active elicitation, where participants are persented with queries showing a pair of user interface fragments and asked which it is they prefer.

The Supple++ Ability Modeler accomodates for 1) pointing 2) dragging 3) list selection 4) multiple clicking. The evaluation criterias were based on distinct interface variants: 1) baseline 2) preference-based 3) ability-based. Baseline interaces are the default, manufacturer's font formatting and print dialog boxes . Perference- and ability-based interface variants were automatically generated for each participant using individual preference and ability models elicited.

The findings are such that users with motor impairments were significantly faster, made fewer errors, and preferred automatically-generated, personalized interfaces over baselines. This is particularly true of Supple++.

In "Sikuli: Using GUI Screenshots for Search and Automation", Yeh, Chang, and Miller present a visual approach to search and automation of GUIs using screenshots. In so doing, Sikuli allows the user to take a screenshot of a GUI element, and query using the screenshot rather than the textual name of the element. The expectation is that web-based searches using Sikuli are easier to learn and faster than textual keywords. Sikuli Search is a system that enables users to search a large collectino of online documentation using screenshots; and Sikuli Script is a scripting system that enables programmers to control the GUI screenshots programmatically. SIFT feature descriptor is used to compute visual words from salient elliptical patches detected by the MSER detector.

The researchers set out to test 2 hypotheses: 1) screenshot queries are faster to specify than key words 2) results of screenshot and keyword search have the same relevance as judged by users. The findings supported both hypotheses.

Sikuli Script is a visual approach to UI automation by screenshots, addressing invalidity issues if the rubberband window is moved or if the elements in the window are rearranged due to resizing. By programmatically controlling the elements with low-level keyboard and mouse input, the approach does not limit screenshot manipulation to a specific application. Find, pattern, region, and action are all functions that help manipulate the image search in Sikuli Script. Sikuli Script shows how visual scripting can interact with the physical world.

Sikuli faces some challenges as users may prefer default appearance theme. Sikuli Script operates only in visible screen space and is not applicable to invisible GUI elements.

The Sikuli paper ties in nicely with the BiD seminar on Programming by Demontration; while the Gajos et al paper seems to have significant contributions to those who are motor-impaired.


Vinson Chuong - 11/9/2011 8:43:54

Gajos, Wobbrock, and Weld present SUPPLE, a system for adapting a GUI to a user's unique preferences or abilities. Yeh, Chang, and Miller present Sikuli, a useful abstraction for reasoning about GUIs using images of widgets or windows.

Instead of forcing impaired users to use accessibility tools, which can add complexity to the interactions, Gajos, Wobbrock, and Weld's SUPPLE systems modify the widgets and layout of a GUI based on information about a user's preferences or abilities collected directly from the user at runtime. SUPPLE moves the work of designing an accommodating interface from the developers to the actual users. Instead of developers having to generalize the preferences and abilities of their many types of target users, often overlooking edge cases, SUPPLE can accommodate users on an individual basis. From the images and videos that I've seen, SUPPLE appears to be limited to swapping out interface widgets for common variants, resizing them, or rearranging them. SUPPLE may not be able to handle unconventional widgets. Moreover, it's unclear to me whether SUPPLE is able to maintain the semantics and flow of an interface. Can users be misled by unexpected widget positions? Nonetheless, I believe SUPPLE provides a compelling framework to developers for baking in a workflow to customize their UI.

Typically, when I want to accomplish some repetitive task on my computer, say organizing and naming MP3s based on metadata, I have two choices (excluding specialized tools): do it by hand in my OS's file management GUI or write a script using some file-management API. In the first case, I'm wasting time performing an obviously programmatic task by hand. In the second case, I may have to learn or re-learn a file-management API. The problem is, I'm used to dealing with my OS's GUI on a day-to-day basis, but I certainly don't use file-manipulation APIs very often. Sikuli essentially provides an abstraction which allows me to code at the level of the GUI, the interface I am most comfortable in reasoning with. In other words, Sikuli allows users to express their intent more directly (in terms of semantic distance) than is allowed by macros or typical scripting languages. In addition to being well-suited for tasks typically handled by code or macros, given Sikuli's abstraction, we can think about automating the kinds of small tasks for which writing a good macro takes far longer than performing it manually. An example given in one of the Sikuli videos shows a script for calling a friend on VoIP. I can imagine running such a script, saying who I want to call, and seeing a video conference call automatically started. Despite already being a task with few steps (open Skype, type in friend's name, click Call), I think Sikuli offers a good productivity boost. As in the previous example, I believe that Sikuli's main power is in providing an easy-to-use API for automating even the smallest of tasks.


Donghyuk Jung - 11/9/2011 8:46:54

Improving the performance of motor-impaired users with automatically-generated, ability-based interfaces

In this paper, the authors implemented two systems (SUPPLE and SUPPLE++) which automatically generates user interfaces adapted to the individual motor capabilities of users with motor impairments. In a study comparing these approaches to baseline interfaces, participants with motor impairments were 26.4% faster using ability-based user interfaces generated by SUPPLE++. They also made 73% fewer errors, strongly preferred those interfaces to the manufacturers’ defaults, and found them more efficient, easier to use, and much less physically tiring.

In my opinion, their idea was very innovative because they made software can adapt itself to the capabilities of its users instead of designing individual input devices designed for motor-impaired users. Additionally, according to the paper, specialized assistive technologies have two major shortcomings (limited availability with some costs, user interface still remains the same) and I think these drawbacks are quite significant in terms of usefulness.

Video Demo: http://www.youtube.com/watch?v=B63whNtp4qc

Sikuli: using GUI screenshots for search and automation

The researchers showed how Sikuli could aid in the construction of “scripts,” short programs that combine or extend the functionality of other programs. Using the system requires some familiarity with the common scripting language Python. I think that Sikuli can be the best solutions for some advanced users. If they want to invoke the functionality of one of automated programs, they simply draws a box around the associated GUI, clicks the mouse to capture a screen shot, and inserts the screen shot directly into a line of Python code.

The researchers also presented a Sikuli application aimed at a broader audience. A computer user hoping to learn how to use an obscure feature of a computer program could use a screen shot of a GUI — say, the button that depicts a lasso in Adobe Photoshop — to search for related content on the web. In an experiment that allowed people to use the system over the web, the researchers found that the visual approach cut in half the time it took for users to find useful content. Although this type of search might be a good solution for beginners (if they don’t have any preference), some advanced users might not try to change their old habits. (Command line vs GUI)

Video Demo: http://www.youtube.com/watch?v=FxDOlhysFcM


Jason Toy - 11/9/2011 9:01:20

Improving the Performance of Motor-Impaired Users with Automatically-Generated, Ability-Based Interfaces

This paper is about two new systems: SUPPLE, which creates a personalized user interface based on users' preferences and SUPPLE++, which create a personalized user interface based on performance tests.

Both Supple and Supple++ work on the premise that the assumption that interfaces are immutable is false. While users usually adapt to interfaces, it does not mean that interfaces cannot adapt to users as well. Evaluation of these systems touch upon some of the problems in "A Survey of Software Learnability: Metrics, Methodologies, and Guidelines", since there is a learning curve in getting used to the new system [initial learnability]. The authors attempt to remove this factor from experimentation by visually guiding users to the next element to be manipulated. Adoptions of these systems in future products may be difficult because of the conventions in place for building software already in place. Developers of new software might not want to deal with building such functionality into their programs. The best way to release such a system might be to build its functionality into the tools that programmers use such that its complexity is hidden from them. Developers can build the menus they want, but people would still be able to configure them.

The paper does a good job creating a comprehensive lab study. I liked the use of qualitative questions that touch upon some of the points of quantitative testing. Even if a system results in faster or more accurate users, it isn't very helpful if the users do not feel that they are not more efficient at tasks. On the other hand, the paper fails to discuss many points I would have found interesting or relevant to the experiment. I would have liked to know what the performance tests included, or what the survey questions asked were. In addition, I am still curious as to how the health conditions and device used (for almost every subject, they were different) were related to performance and response to the system.

Sikuli: Using GUI Screenshots for Search and Automation

Sikuli is a new system that allows users to search and automate GUI's through the use of screenshots.

This system is motivated by the lack of an effective mechanism to deal with GUI elements. When we search on the internet, we are forced to describe our problem with text on google. When we refer to GUI elements in programming, we refer to elements' obscure names rather than the elements themselves. Sikuli reminds me of a class of autoclickers or programs like the Selenium IDE. The class of programs which depend on the GUI such as autoclickers, Zoetrope [Zoetrope: interacting with the ephemeral web], and Sikuli all are susceptible to the problem of changing UIs. I find Sikuli's automation tool particularly interesting given that today we use automation tools such as Watir whose input are unintuitive lines such as 'browser.radio(:value => 'Watir').set'. In addition many current automation tools are either built specifically for the web (Selenium and Watir) or for the desktop (a number of autoclickers [on the web, some autoclickers that are based on global position may have much more trouble than web optimized programs]). An interesting result of Sikuli is that the automation tool can be used on both desktops and web interfaces.

The paper definitely addresses a well-motivated problem: the difficulty of users to deal with a variety of interfaces that are really easier to describe visually than through text. However, one area where the paper is lacking is in its evaluation of the automation tool, which does not exist at all. This concerns me because of the possibility for error: the paper mentions that OCR and other techniques, which do not have a perfect success rate, are used. In addition, the accuracy rate of the search analysis was 70.5%. The consequences of a mistake in automation versus a mistake in search is vastly different, especially if we are talking about deletion automation scripts such as in example 2. Even if the same techniques are used to differentiate GUI icons, by doing an evaluation on subjects on the automation program, the authors could have gathered some information to improve their system from user comments or mistakes, vital when the stakes are this high.


Sally Ahn - 11/9/2011 9:02:09

Today's readings focus on the automation aspect of UI. In the first paper, Gajos et al. presents SUPPLE, ARNAULD, and SUPPLE++ to address the challenge of generating specific interfaces for users with disabilities. The second paper introduces Sikuli, a system that uses computer vision to automate interactions with GUI components.

Gajos et al. take two approaches to automatically generating UI: 1) modeling users preferences and 2) modeling users abilities. Although the authors focus on assisting the motor-impaired audience, their study is relevant to personalized UI for the general public as well. I think it's interesting that their results consistently show best results for the second approach (modeling abilities rather than preferences); it emphasizes that the UI designer must not rely on the users' judgements and feedback and motivates further research on measuring and modeling their abilities.

Sikuli is a system that operates on screencaps to automate interactions with GUI components. The novelty of this approach is interesting, but it can be a clear disadvantage if GUI components undergo frequent design changes. I am also not convinced by image-based queries; while their user studies show that such queries may be faster than keyword queries, I would think the complexity of such queries will be much more limited. Other disadvantages that the authors mention include failure to invoke occluded GUI elements and commands that lack GUI elements.