2011-02-14

Analysis and the 'So What' Question

While at Strata I had an opportunity to participate in quite a few sessions that demonstrated how to take raw data and analyze it with various tools.  The output was usually a set of graphs, charts, etc, though sometimes just simple tables.   All of this was useful to get a sense of how the tools work, but what was missing was the final step in the analysis - a powerful insight or understanding that one could use to make an intelligent change to a process.   Generally, the presentation technique was fine, the tools were great, but the demonstrated impact of the tools was trivial.

One reason for this is that some of the presenters may have to hold back on their most significant discoveries until the right time - and this just wasn't that time, or this wasn't the right audience.  I can understand this - since most of my best analysis can't really be shown without getting NDA and other agreements in place first.  Another reason is that the presenters might have wanted to focus on the tool and not the data or business being studied which is just serving as a necessary example to work on.   But this is misguided, since delivering insights is the bottom line - not delivering pretty pictures.   The last reason I can imagine is that delivering powerful insights is hard, and while these presenters are working on it they may not yet have a suitable example.  And I think that this is the most likely answer.

My concern is that people spend a lot of time building gorgeous but empty-headed analytical solutions that just don't have much to say.    This is pretty similar to the chart junk problem that Edward Tufte complains about.   To make this a little more clear I've included a few examples below.



LinkedIn Map Example 

One example is in the linked in networking diagrams produced by InMaps.  These large colored linked analysis graphs are a lot of fun to look at, but what do they really say?  Perhaps used interactively they might have more function, but as a purely read-only or print product they seem to be nothing more than a novelty.    The wonderful old question applies here - "So What?".  And I didn't find a single person that mentioned a single useful discovery that they made with the tool.

But it could be part of a solution that does have a lot of impact.  I think it's being positioned as an alternative way to interact with your network - so that you should be able to work with contacts interactively through it.  OK, that sounds fine.   Here's a few other examples:

  • How about building & connecting the maps of multiple people, and then zooming in on the overlapping connections?  That would allow a few people at a lunch table to immediately identify common connections, perhaps break the ice, and perhaps suddenly discover that they've got a lot to talk about.
  • How about offering controls to identify nodes (contacts) related to projects, companies, periods in time, or skills?  That would allow someone to easily see weak areas in their network where they might want to go back and add contacts.
  • This is my favorite - how about comparing the network at two different points in time and both measuring and showing the changes.  In fact, LinkedIn could offer this as a service at conferences like Strata, and then O'Reilly could publish summary results in order to explain the value of attending a conference of this type.  For example, "The average participant at O'Reilly's Strata Conference added 17 new immediate contacts in the Data Science field.
Afghanistan Map Example

While examining methods of mapping data geographically, one example that came up several times was the ability to show where IED attacks in Afghanistan were occurring and from this deduce that it was along roads.   Well, I think this successfully demonstrated what the tool could do, but it didn't really show much impact.  I think we could already guess that these attacks were near major roads.   Here's a few ways we could take that further:

  • Include more time analysis to see if the attacks are clustered around certain days of the week or months of the year.  If so, it may mean that the attackers are only available to plant them then, or that that this is just when the troops are in the area.  To eliminate the latter possibility ideally you would add some kind of troop movement data that would show them safely moving through the area at other times.
  • Show changes over time along with major campaigns or troop movements.  This might show that the IED problem only exists after an area has been occupied, but not during a major campaign.   Which might imply that the solution is 80% political and just 20% military - in winning over locals after the area has been taken over by troops.
  • Show changes over time along with troop numbers in the area.  This may show that the IEDs are getting more or less effective over time.
To do any of these things it will take additional data (that's probably not available to us), additional data logistics, and we may discover that we've hit the outer limits of the feature set of the graphing tools involved.  Probably not, but again, delivering impact is what it's all about and the acid test is whether the tool can handle the additional features needed.

No comments:

Post a Comment