and the winner is...

A big thank you to everyone who participated in the data viz challenge earlier this month (and thanks for your patience in awaiting this recap). As you may recall, the challenge was to help a philanthropic organization communicate a bunch of data about their various affiliates. If you're interested in a refresher on the details, you can find the challenge post with the full description here.

In this post, in addition to announcing the winner, I'll show a quick recap and my reactions to each of the submissions.

Submission 1: Peter Osbourne

You can view Peter's full description of his thought process in the comments of the post linked above. His main point was that, depending on the story one wishes to tell, a summary metric like averages may do the trick. Below is a snapshot of his workbook (he added the columns after the yellow one; full workbook can be downloaded here). In his comments, he makes a great point about figuring out what the story is first and then determining what data you have that best supports it (vs. putting together data and then trying to form the story).

Submission 2: Jon Schwabish

Jon decided on an interactive Excel graphic (download available here), which allows you to toggle across the various affiliates to get relevant detail on each. I really like the simplicity of the visual design used here. Great use of preattentive attributes in the line graph to make the blue line stand out from the others.

Submission 3: Lubos Pribula

Lubos continued the interactive Excel dashboard trend (downloadable here). I like the use of color to visually tie the line graph to the tabular data below (though we should be careful about the red-green color combination, which can be difficult for those who are colorblind). I also like the embedded bar charts within the tables at the bottom, which allow you to quickly visually compare aggregate measures across the various affiliates.

Submission 4: Gautham

Gautham created a dashboard in Tableau (if you don't have Tableau, you can download Tableau Reader here; Gautham's dashboard can be downloaded here). This dashboard allows you to view a single affiliate at a time and see a visual of their total assets in bars and number of gifts and grants via lines. This is useful if you want to compare the number of gifts and grants, or get a sense of the over time trends for a specific affiliate.

Submission 5: Rupert Stechman

Rupert took an unconventional approach to his data viz and went old school with pen and paper (which I love!) and created a sort of heatmap showing net change in assets over time by affiliate. Here's what he came up with (his blog post is here):

AND THE WINNER IS... Submission 6: Jeff Shaffer

Jeff created both a Tableau dashboard (downloadable here) and an Excel dashboard (pictured below; downloadable here). He doesn't win because he submitted dashboards in multiple formats, but rather because his visual is the one the foundation said they could see themselves using.

Here's what the philanthropic organization said:

Thank you so much for trying to help us get a visual for our data. Your readers are much more skilled than I, and did some really interesting things with the data. I think Jeff Shaffer came closest to getting us something like what we need. His dashboard approach would be really useful in some instances."

Personally, I would have had a hard time choosing a winner (one reason I'm happy the philanthropic group made the decision for me!) - there are components I like from each of the visuals and I think each could work well, depending on what story you want to tell and who the audience is. This is a great reminder how important those pieces are - it's really difficult to create the perfect visualization without a good understanding of what story we want to tell and who we want to tell it to. We should absolutely spend time up front establishing that (and coaching our colleagues and clients to do so) before we create the supporting visual.

9/4 UPDATE: Jeff graciously agreed to put together a "how to" for creating the dashboard above, which you can download here.

Cole's non-competing submission

And I of course couldn't help but build my own visualization of this data as well. I did not go the interactive dashboard route, because the description made it sound like it was important to understand the trends for a given affiliate while also being able to compare those to other affiliates (hard to do in a dashboard that focuses on one affiliate at a time, though a couple of the above submissions address this in different ways). Here's a snapshot of what I came up with (I just show 4 here, but this approach continues for each of the affiliates; the Excel file is downloadable here):

Thanks, all, for playing (and Jeff, my offer stands to have you write a guest blog post if you're interested!). Let me know if you think I should pose challenges like this again in the future!

we are what we eat

As those who know me are aware, some of my biggest passions arise in the realms of data visualization and food. Every so often, there is an intersection of the two seemingly unrelated subjects. I recently came across one example, a project called "The Eatery: A Massive Health Experiment".

Part of the project is an app: you take pictures of the food you eat, and it records data to show patterns about your eating habits back to you. There's an interesting crowd sourcing component, where in addition to rating how healthy the various dishes you're eating are, others (friends or strangers) can rate the healthiness of what you're eating as well. The concept is interesting: that by being more informed about what you are eating and how it fits together, you can be more aware of unhealthy patterns and change habits to improve health. Here's a video with more details:

The data collected isn't yours alone, but also contributes to a growing database, from which the folks behind the project are starting to pull observations and trends from analysis and visualization of the data: currently the data is over 7 million food ratings of half a million foods by Eatery users from over 50 countries over a span of 5 months. While I'm not a huge fan of the cartoony infographics, they do contain some interesting factoids, and I love the time-based visual on the relative healthiness with which people eat across geographies. I've put a screenshot of it below; you can view the interactive version of it as well as the infographics here.

Collecting individual data for better decision making seems to be an area of growing interest. Are you aware of other mechanisms for doing so? What data do you (or would you like to) collect about yourself? What do you do with it?

visualizing everyday life

The data visualization in my life is primarily in the business-world. At my day job: how do we ensure that people decisions at Google are data-driven? In my presentations and workshops: who is our audience, what do they need to know, and how do we craft a visual and story to do that?

But many take data visualization into the personal sphere as well: using visualization to better understand aspects of their world or their life. I encountered one such example recently, when a data viz course participant at Google shared an example he created:

"Hi all,  Here is silly little thing I cooked up over the weekend. My wife likes fresh tomatoes, of what are called heirloom varieties (not the big commercial ones) - 16 different ones each year in our garden. We used to have trouble selecting which ones to grow each time, for the last 4 years have kept pretty good records of them, so I wanted to see if there were any patterns.

This is my first such chart after taking the basic data viz class, where I had a chance to sit and think about how to make it look. 

I did violate the color palate guidelines a bit, to color code each tomato by type. But this makes the type of tomato stand out, as well as the pattern."

Neil goes on to say, "Interestingly enough, until I graphed it, I didn't know that we rarely have a yellow tomato invited back a second year. Our by year lists (stored on a wiki at home) tended to mask that information." 

I love the use of data viz for this sort of problem solving: what type of tomatoes should I plant this year? I think Neil's next challenge will be to identify and start recording and visualizing some success measures (e.g. plant yield, flavor) to really hone his future garden crops.

This reminded me of another food-related data viz I saw some time ago, where a woman had tracked everything she ate for a year, then created a number of visualizations based on the data. You can read about that and see the visuals in this Flowing Data post.

Food for thought (pun intended!): what do you (or could you) visualize in your life?

visual battle: table vs graph

In a data visualization battle of table against graph, which will win?

The short answer (which may be less than satisfying) is: it depends. Mostly, it depends on who the audience is and how the data will be used. One important thing to know is that people interact very differently with these two types of visuals. Let's take a quick look at how and some use cases for each, then we'll look at a specific example from a recent WSJ article.

Tables, with their rows and columns of data, interact primarily with our verbal system. We read tables. When I have a table in front of me, I typically have my two index fingers out - I scan across rows, down columns, and I compare values. Tables are great when you have an audience who wants to do just that. Or if you have a diverse audience, where each wants to look at their own piece: a table can meet this need. Tables are also handy when you have many different units of measure, which can be difficult to pull off in an easy to read manner in a graph.

Graphs, on the other hand, interact with our visual system. It's a high bandwidth information flow from what our eyes see to the comprehension in our brain, which can be extremely powerful when done well. Graphs can present an immense amount of data quickly and in an easy-to-consume fashion; they are particularly useful when there is a point to be made in the shape of the data, or for showing how different things (variables) relate to each other.

Let's look at an example. There was an article posted recently in the Wall Street Journal online titled, "Young Workers Like Facebook, Apple, and Google" (article). With the article, came an "Interactive Graphic," a table listing the 150 companies included in the survey, relative rank, and the percentage of young worker respondents that voted for each. (Slight tangent: while I suppose the interactive label fits, I was a little surprised to find that the only way I could interact with the data was to sort each column in either ascending or descending order - I guess this would be useful if I were looking for a particular company, so I could alphabetize the list, but utility beyond that is limited.) Here's what the top of the table looked like:

Question: was it right of WSJ to include a table rather than a graph?

In this case, I think the answer is yes. The article spends time discussing Google in the top spot (making the article title seem somewhat incongruous to me...also interesting that they mention Google last out of the three companies called out in the title while it ranked first), but then also points out some other nuances, for example the decrease in financial sector rankings (though the year over year data is not provided to the user). My assumption is that they wanted to include all of the data so that users could look up specific companies of interest, or look at the top or bottom of the list. This hits the one of the table criteria that we described above: a diverse audience, each wanting to look up their own piece.

If, however, the primary goal is to make the point that Google is well ahead of the pack (which is the focus of the majority of the article), a graph would help us to visually tell the story more quickly and arguably more effectively than can be done with the table.

Question: what should we graph? Graphing all 150 companies is out of the question: there are too many and the tail will take up more space than the value seeing it will add. So we know we need to graph something less than all, but the question remains: where should we make the cutoff?

We can pick a clean number (this is likely the rationale behind the top 3 that WSJ mentions in title): top 5, top 10, top 20. But in doing so, we run the risk of including and excluding companies of very similar values (for example, if we were to graph the top 10, we'd include the CIA at 5.04% but exclude Nike, which is only 3 basis points lower, at 5.01%). This isn't to say this isn't acceptable, but to point out that it should be an explicit decision: you should understand the pros and cons of this approach and be accepting of the cons (vs. not recognizing that they exist).

Another option is to graph the data and then look for the natural breaks that occur and have our cutoff reflect this nuance in the data. Here's what it looks like if we graph the top 25 (quick & dirty):

Here, the y-axis is the % of respondents and the x-axis is company rank. I found it hard to see the difference in the length of bars plotting this direction, so also tried the horizontal bar chart:

I find it much easier to see the relative differences in this second iteration of the chart (somehwhat due to the compression of the bars, also it just seems easier to scan down vs. across to spot differences in bar length). Based on this, it looks like there are clear differences between 7th and 8th place, between 8th and 9th, between 11th and 12th, between 15th and 16th, and so on. We could make arguments for a number of different cutoffs. In this case, I'm going to decide to take the top 15, both because it's a clean number (I've always liked multiples of 5, not sure why) and because we see a drop between the 15th and 16th positions (it's also the point where we break the 4% mark: 4.04% respondents vs. 3.80%, which I can note in a footnote).  You could make an argument to make the cutoff in another place, but this is what I'm going to go with for the reasons that I've outlined.

So if I want a visual to highlight the point in the article that Google is ahead of the pack, here is what it could look like:

Main takeaway: when debating table vs. graph, ask yourself how the data will be used and consider your audience. Let the utility of the visual that is needed drive your decision.

interesting

I just came across this graphic over at Chart Porn. What story would you tell with this data?