what I look for in effective data viz

Last week, I had the distinct honor of being one of the judges for the very first VizCup, hosted by Facebook. Participants had an hour and their choice of half a dozen or so datasets plus their preferred software to create an interactive data viz. Each group/individual then had 90 seconds to explain and demo their visualization. After an hour or so of seeing some amazing work, the judges narrowed it down to the top five, who each had a little more time to share their viz. Top entries ran the gamut in scope, ranging from being able to see UFO sighting stats on your birthday, to where to locate to avoid natural disasters, to the winning entry that looked at the bias in soccer red card handouts by referee. Interestingly, all of the top entries (and nearly all of the entries overall) were created in Tableau, likely due to its ease and speed in creating interactive visualizations.

Leading up to the event, I put some thinking time to the most important things that I look for when judging the effectiveness of a data visualization. I thought this might be of interest, so will share it here.

At a high level, there are four things I look for when evaluating data viz:

  1. A sensible display. The choice of graph or visual is appropriate given the data and given the purpose.
  2. Absence of clutter. I disdain it! The presence of elements that don't carry informative value or aid interpretation in some way will hurt not help when it comes to my evaluation.
  3. Affordance in design. Through strategic use of things like color, size of elements, spacial position, and text, it is so clear to the audience how to interact with the data visualization that they don't even notice the design.
  4. A clear story. For me, the best data visualizations are the pivotal point in a story. Use written or spoken narrative (or a combination thereof) to make the story your visualization tells clear.

Big thanks to everyone who worked to make this event a success: the Facebook team, the other judges (Drew Skau from Visual.ly and Anya A'Hearn of DataBlick), and especially Andy at VizWiz for inviting me to take part in the event!

The Facebook team that organized the event, plus judges.

and the winner is...

A big thank you to everyone who participated in the data viz challenge earlier this month (and thanks for your patience in awaiting this recap). As you may recall, the challenge was to help a philanthropic organization communicate a bunch of data about their various affiliates. If you're interested in a refresher on the details, you can find the challenge post with the full description here.

In this post, in addition to announcing the winner, I'll show a quick recap and my reactions to each of the submissions.

Submission 1: Peter Osbourne

You can view Peter's full description of his thought process in the comments of the post linked above. His main point was that, depending on the story one wishes to tell, a summary metric like averages may do the trick. Below is a snapshot of his workbook (he added the columns after the yellow one; full workbook can be downloaded here). In his comments, he makes a great point about figuring out what the story is first and then determining what data you have that best supports it (vs. putting together data and then trying to form the story).

Submission 2: Jon Schwabish

Jon decided on an interactive Excel graphic (download available here), which allows you to toggle across the various affiliates to get relevant detail on each. I really like the simplicity of the visual design used here. Great use of preattentive attributes in the line graph to make the blue line stand out from the others.

Submission 3: Lubos Pribula

Lubos continued the interactive Excel dashboard trend (downloadable here). I like the use of color to visually tie the line graph to the tabular data below (though we should be careful about the red-green color combination, which can be difficult for those who are colorblind). I also like the embedded bar charts within the tables at the bottom, which allow you to quickly visually compare aggregate measures across the various affiliates.

Submission 4: Gautham

Gautham created a dashboard in Tableau (if you don't have Tableau, you can download Tableau Reader here; Gautham's dashboard can be downloaded here). This dashboard allows you to view a single affiliate at a time and see a visual of their total assets in bars and number of gifts and grants via lines. This is useful if you want to compare the number of gifts and grants, or get a sense of the over time trends for a specific affiliate.

Submission 5: Rupert Stechman

Rupert took an unconventional approach to his data viz and went old school with pen and paper (which I love!) and created a sort of heatmap showing net change in assets over time by affiliate. Here's what he came up with (his blog post is here):

AND THE WINNER IS... Submission 6: Jeff Shaffer

Jeff created both a Tableau dashboard (downloadable here) and an Excel dashboard (pictured below; downloadable here). He doesn't win because he submitted dashboards in multiple formats, but rather because his visual is the one the foundation said they could see themselves using.

Here's what the philanthropic organization said:

Thank you so much for trying to help us get a visual for our data. Your readers are much more skilled than I, and did some really interesting things with the data. I think Jeff Shaffer came closest to getting us something like what we need. His dashboard approach would be really useful in some instances."

Personally, I would have had a hard time choosing a winner (one reason I'm happy the philanthropic group made the decision for me!) - there are components I like from each of the visuals and I think each could work well, depending on what story you want to tell and who the audience is. This is a great reminder how important those pieces are - it's really difficult to create the perfect visualization without a good understanding of what story we want to tell and who we want to tell it to. We should absolutely spend time up front establishing that (and coaching our colleagues and clients to do so) before we create the supporting visual.

9/4 UPDATE: Jeff graciously agreed to put together a "how to" for creating the dashboard above, which you can download here.

Cole's non-competing submission

And I of course couldn't help but build my own visualization of this data as well. I did not go the interactive dashboard route, because the description made it sound like it was important to understand the trends for a given affiliate while also being able to compare those to other affiliates (hard to do in a dashboard that focuses on one affiliate at a time, though a couple of the above submissions address this in different ways). Here's a snapshot of what I came up with (I just show 4 here, but this approach continues for each of the affiliates; the Excel file is downloadable here):

Thanks, all, for playing (and Jeff, my offer stands to have you write a guest blog post if you're interested!). Let me know if you think I should pose challenges like this again in the future!

we are what we eat

As those who know me are aware, some of my biggest passions arise in the realms of data visualization and food. Every so often, there is an intersection of the two seemingly unrelated subjects. I recently came across one example, a project called "The Eatery: A Massive Health Experiment".

Part of the project is an app: you take pictures of the food you eat, and it records data to show patterns about your eating habits back to you. There's an interesting crowd sourcing component, where in addition to rating how healthy the various dishes you're eating are, others (friends or strangers) can rate the healthiness of what you're eating as well. The concept is interesting: that by being more informed about what you are eating and how it fits together, you can be more aware of unhealthy patterns and change habits to improve health. Here's a video with more details:

The data collected isn't yours alone, but also contributes to a growing database, from which the folks behind the project are starting to pull observations and trends from analysis and visualization of the data: currently the data is over 7 million food ratings of half a million foods by Eatery users from over 50 countries over a span of 5 months. While I'm not a huge fan of the cartoony infographics, they do contain some interesting factoids, and I love the time-based visual on the relative healthiness with which people eat across geographies. I've put a screenshot of it below; you can view the interactive version of it as well as the infographics here.

Collecting individual data for better decision making seems to be an area of growing interest. Are you aware of other mechanisms for doing so? What data do you (or would you like to) collect about yourself? What do you do with it?

OECD better life initative

I have to be honest: at first, I was totally turned off by the flower visual representation within the OECD's Better Life Initiative. An email with the link to this site has been sitting in my inbox for weeks as I tried to muster the energy to write about flowers (I am not a flowery person). Today, I decided to try to move past that and use the tool to actually explore the data.

I'm glad that I did. It's kind of amazing. I highly recommend playing with it directly (here). But in case you aren't convinced, let's take a quick tour of this interactive visual.

Here's what you'll see on the main page (and what initially turned me off through the cuteness of flowers for data visualization):

Each flower represents one of the 34 member countries in the OECD. Each petal represents one of the 11 aspects of life (e.g. housing, education, work life balance) that make up the OECD's index (details on how they chose these, data sources, etc. are in the FAQ). You are given complete control over how the various aspects combine to the summary index - to begin with, all are weighted equally, but the user can dial up or dial down how much each counts towards the overall index. The size of the petal indicates how the given country rates on the given life aspect. The height of the flower represents the aggregate index for the given country (based on the the user-input weightings) relative to the other countries.

One thing that bothered me initially was that the flowers seemed randomly ordered and I couldn't tell what the height represented (or even whether it was meaningful). This confusion goes away as soon as you start setting your own preferences and dialing up/down the different aspects of life on the right. For example, if I dial down the emphasis of income and select display countries by rank (bottom right; default is alphabetical, which is messy visually), I get the following:

The blue petals (income, which is what I changed) are highlighted. This is a nice use of preattentive attributes to make one aspect stand out and push all else to the background. I can quickly see the relative index by country (height of flower) visually.

As you hover your mouse over the different countries, another visual comes up that shows the relative scores for the given country across the various aspects. These horizontal bars make it much easier for our eyes to see the relative magnitudes than do the petals of the flower. But the petals do compact in a way that makes them work for the supergraphic (which is showing a LOT of information) and I like that the details are there on demand. There is good consistency of color throughout the site, with each aspect shown in the same color in each of the various visuals. The color palette of slightly muted shades is a good choice (bright colors would quickly transport us to the world of Rainbow Bright, which can be distracting).

Clicking on a specific country will bring you to a page on that country and some nice visuals showing how it ranks on each attribute compared to other countries (through another nice use of preattentive attributes to highlight the given country). Clicking on a specific aspect in the Topics menu will take you to a page on that aspect, including a nice visual of countries in ascending order by the given aspect.

In sum, this is a well organized site with some nice visuals to explore. It packs in a lot of data. I love everything except the flowers!

By the way, the OECD is a great source of data if you're ever looking for a dataset to analyze or visualize. Google has a great tool, Google Public Data Explorer, that brings together public data from various sources (including the OECD) and provides tools to visualize it.

cool real time data capture and display

Today's NYT online features an interesting interactive visual on the primary US news topic of the moment - the death of Osama Bin Laden. The Times poses two questions: How much of a turning point in the war on terror will Bin Laden's death represent? and What is your emotional response?

Readers can indicate how they feel about these questions along a scale ranging from insignificant to significant (y-axis) and from negative to positive (x-axis). In addition to picking a position, the audience can input comments, which pop up on the visual as new content is added.

I love that this doesn't pick a position, but instead lets users generate the content. At the time of my blogpost, the upper right quadrant (positive, significant) is the most densely populated (and I imagine this will continue to be the case). A quick read through some of the comments (possible by mousing over the cells) shows what is often true: the outliers are as interesting (in some cases more so) as the predominant trend.

What other ways could we use interactive real time visuals like this? Leave a comment with your thoughts.

bubblechart for gadget trends

Last month, hot on the heels of the Consumer Electronics Show, the Washington Post included an interactive graphic to show the rise and fall of various gadgets sales over time across a number of categories (communication, computing, television, video and photo, audio) and the concurrent change in gadget prices.

Here's a screen shot of the communication graphic:

It takes a bit of patience to get your bearings, as there's a lot going on, but once you do, there are some interesting trends to observe and questions to ask. First, let's dissect the visual: time on the x-axis, millions of gadgets sold on the y-axis, the different shades of blue represent different types of communication devices - in order from dark to light - corded phones, standard cellphones, cordless phones, smartphones, and (in grey) fax machines. The size of the circle indicates the average price, according to the legend at the upper left.

This is an image that invites exploration. Mousing over the different series gives you more detail on demand, and also prompts some nice highlighting via preattentive attributes, with the series you mouse-over remaining in its bold color and all else fading to the background, as shown below.

So now, in the above, we can see the rise of standard cellphones through around 2008, when the popularity of smartphones increased (far right trend), causing the sale of standard cellphones to fall. Some interesting things jump out to me that I would want to explore further - for example, I'm interested in what led to the sharp increase in units sold from 1996 to 1997 and again from 1999 to 2000. Through this graphic, I also learned that the first standard cell phones were sold when I was just 4 years old for an average price of over $4K! I'm curious whether the size of the phone follows roughly the same trend as price - my guess would be it does. It's truly amazing how far technology has come in the past couple of decades.

The catch with interactive graphics like this is that you have to have an audience who is patient enough to explore. They also tend to work better for just that - exploration - when there's no specific conclusion that you're looking for your audience to draw. Perfect for a news article. Not so much for a sales pitch, for example, where you'd want to be in a little more control of your audience's attention.

Mark, if you're reading this, thanks for sharing the article with me!