Reference Information:
Rebecca Fiebrink, Perry Cook, and Dan Trueman. "Human model evaluation in interactive supervised learning". CHI '11: Proceedings of the 2011 annual conference on Human factors in computing systems ACM. New York, NY, USA. ©2011. ISBN: 978-1-4503-0228-9.
Author Bios:
Rebecca Fiebrink- I will be joining Princeton as an assistant professor in Computer Science and affiliated faculty in Music in September, 2011. I have recently completed my PhD in Computer Science at Princeton, and I will be spending January through August 2011 as a postdoc at the University of Washington. I work at the intersection of human-computer interaction, applied machine learning, and music composition and performance.
Perry Cook- Professor Emeritus* (still researches but no longer teaches or accepts new graduate students) at Princeton University in the department of Computer Science and Dept. of Music.
Dan Trueman- Professor in the department of Music at Princeton University.
Summary:
- Hypothesis: If the user can be allowed to iteratively update the current state of a working machine learning model, then the results and actions taken from that model will be improved (in terms of quality).
- Methods: The authors conducted three studies of people applying supervised learning to their work in computer music. Study "A" was a user-centered design process, study "B" was an observatory study in which students were using the Wekinator in an assignment focused on supervised learning, and study "C" was a case study with a professional composer to build a gesture-recognition system.
- Results: From the studies, the authors gathered results and analyzed all of the results. From study "A", the authors saw that participants iteratively re-trained the models (by editing the training dataset), for study "B", the students re-trained the algorithm an average of 4.1 times per task, and the professional from study "C" re-trained it an average of 3.7 times per task. Cross-validation wasn't used in study "A", but for studies "B" and "C", cross-validation was used an average of 1 and 1.8 times per task, respectively. Direct evaluation was also present in the evaluation metric of the system. This was used more frequently than cross-validation. Participants in "A" strictly used this measure, while the students and the professional in studies "B" and "C" used direct evaluation on an average of 4.8 and 5.4 times per task, respectively. Using cross-validation and direct evaluation, users were able to receieve feedback on how their actions affected the outcomes. The overall results were that users were able to fully understand and use the system effectively. The wekinator allowed users to create more expressive/intelligent/quality models than with other methods/techniques.
- Conent: The authors wanted to create some method to allow users to provide feedback to the system iteratively while it is being used in order to hopefully create some iterative machine learning mechanism. The authors conducted user studies to test their hypothesis and collected results that suggested that the methods provided by the authors to perform tasks were superior to other techniques.
I thought this article was sort of interesting. It is a really good idea to have a system where you can tell the machine what is "good" or "bad" before it even spits out the final result. Being able to re-train your algorithm (mid-computation) is really beneficial to have. The cost-benefit for it seems reasonable, so I could see this idea becoming more widespread before long. The authors, in my opinion, definitely achieved their goals and proved their hypothesis to be true. I didn't understand cross-evaluation or direct evaluation in terms of the actual methods too much, but I know those factors were taken into consideration when collecting data for "satisfaction" of the system.
No comments:
Post a Comment