At the recent IEEE InfoVis Conference in Berlin, my research group collaborated with colleagues on two published papers. This blog entry gives a quick and dirty overview of one of those two papers: “Augmenting Visualizations with Interactive Data Facts to Facilitate Interpretation and Communication.”
In the broad field of data analysis, recently there has been an increasing effort to “automatically” generate insights about a data set. Sophisticated techniques from the database and AI communities help generate these insightful observations about the data, usually in a natural language expository form. Now, precisely what constitutes an “insight” is a matter of debate, something that I explored in a previous column. In our research, we choose to use the term “data fact” instead, reserving “insight” for deeper and more meaningful realizations about a data set.
Our paper at InfoVis was the lead effort of PhD student Arjun Srinivasan, with help from Steve Drucker at Microsoft, and Alex Endert and me here at GT. The key contribution of the work is to think of these data facts that can be generated for a data set not as static utterances, but as interactive components of a more comprehensive data analysis system. We built a system called Voder1 that illustrates this principle in action.
When an investigator specifies/creates a chart that visualizes variables of interest, Voder generates data facts corresponding to those variables. As the investigator moves the cursor over the facts, the visualization changes (perhaps just a highlight) to emphasize and help explain the fact being examined. Furthermore, Voder presents alternative visualizations that also illustrate the fact, and it gives the investigator different options in how to embellish the visualization to communicate the fact.
Voder also provides a search capability in which the investigator can type in terms such as variables on the data set or analytic queries (e.g., “correlation”, “outlier”), then Voder generates visualizations and data facts pertinent to the query terms. Thus, the system facilitates a flexible data analysis process that can start with visualizations, with data facts, or with keyword searches, and supports easy, fluid transitions between each of these aspects. Voder also provides a “presentation” mode where interactive data facts and visualizations can be compiled as slide decks or dashboards.
A formative user study of the system with people of varying visualization backgrounds identified a great deal of promise for the approach. Less-experienced participants appreciated the help Voder provided for interacting with visualizations. Experts appreciated that too, but also hoped for deeper observations in the data facts. Attendees that we spoke to after the talk at InfoVis expressed excitement about the potential of the system for assisting visualization literacy and education as well.
1Voder is a disc-shaped voice-box translation device from Star Trek. It was also Bell Lab’s device that was the first machine to generate human language.