Putting statistics online

4 02 2010

We have a statistics website at work with an enormous amount of statistics available for PDF download.  I like PDF downloads.  But.  Just to get a table of statistics, a user has to click through 5 layers of navigation!  The menu, sub menu, sub sub menu, sub sub sub menu…. you see?  It’s not really ideal – all that just to get a downloadable PDF, which usually opens up in an annoying new window as well.

The sub menu navigation on the site is basically like choosing a statistics category, and graph metrics.  So, I figure it’s got to be cleaner if we move to some sort of online graphing tool, right?  We could have a single page even, where a form (or cascading forms) refresh the graph and it’s metrics as you click.  All in one spot, tidy and neat, and fun.  It’d also be easy to have a tab for table-view for people who don’t like graphs.

There’s a lot of problems to overcome before I can start though.  I wrote a report for my boss, just to itemise what we’d need to consider, to move to online graphing.  The stuff below is a part of the report.  If anyone out there has any wonderful ideas about the best way to go, please let me know.  I’m mostly interested in how to get the data in to XML (from excel) in an automated way; and how to conceptualise heavily layered statistics in to graphical break downs.  Is all this a good idea?

As a start I actually did a tiny prototype using the free tool SWF/XML Charts.  It’s a good tool, easy to use, looks reasonable to the eyes – and most of all it was easy to flex in to something more powerful, with a bit of php, ajax and jquery.  I had it set up to work really well – the forms refreshed the graph on the fly; the table view read the same XML file as the Flash graphing tool, and transformed it in to a HTML table (in the table tab).  It was so dynamic and I loved it as a basic prototype – but I’m just not sure of the best next step yet.

Example graphing prototype

It worked really well - there was one tricky part though, the tabs loaded content with jquery ajax and if you changed the forms whilst in table view (for example) the graph view didn't refresh because it didn't exist. Luckily it was fairly straight forward to fix with some callback methods in the ajax call.

Oh, and another tip – the tool hated having symbols parsed in the URL for the xml data source. So my ajax table loading might be like this:  (all these variables are actually values from the form, and the events are triggered by changing the form)
$("#Table").load("tableContent.php?data=TrendData_MainIntake" + "&graph=" + graph + "&category=" + category);

But the url variable in the script that loaded the flash graphing tool had to be like this, using hex values in the path:
url = "graphContent.php%3Fdata%3DTrendData_MainIntake" + "%26graph%3D" + graph + "%26category%3D" + category + "%26type%3D" + type;

It’s fun getting all this stuff to work. I hope it turns in to an excellent project, a small part of my report is as follows if you’re interested.
A small part of my report:

Presenting the data

To arrive at the most suitable working solution, first the problems must be evaluated and overcome.  The first foreseeable problem is – how to present complicated data graphically.  Some of the statistics are quite simple, others are not, this will be discussed further.  The second problem is the actual format of the data itself, in Excel.  Because our statistics are static, and not from a dynamic database, the format will need to be in XML (eXtensible Markup Language), which is an industry standard markup used by a multitude of platforms, software and web applications – even applications such as iTunes run their entire music library from XML files.  The data format will be discussed further.

The data used by the Stats Team originates from a range of sources, Peoplesoft, spreadsheets, raw data files, and is finalised in an Excel spreadsheet by the Team.  Some of the spreadsheets are very simple and pose no problem at all, such as the Student Demographics -> Students by Age by Gender.  An example snippet is as follows:

The example just shown is a simple table and can be fairly quickly and easily visualised as a bar/column/line graph for the web, and in terms of the data format, simple tables as such can be more easily converted to XML.

Some spreadsheets have layers within layers of datasets, and this will not translate well in to the online medium.  For statistics such as Student Load by Funding Group, Faculty and Campus to be online, they will need to be re-conceptualised and presented in a broken down, or stepped, manner.  Note in the (extremely simplified) example snippet below, the layers (indicated by bold, italics and indents*) – Funding group totals for Faculties, grand totals for Funding groups, Campus break downs and so on.

A solution to such a problem could be as follows.  By breaking down the layers, this data can be presented in a stepped, bit by bit approach.  Take for example some of the Fusion Charts Demos – http://www.fusioncharts.com/Demos/Blueprint/

In the Sales Per Employee for Year x graphs the data is broken in to layers.  First viewable is an interactive pie graph, where the user selects the employee (in our example, this could be the Funding Group); then the detailed graphs for that employee load below (in our example, this would a suitable breakdown of Faculty, or Campus, or Funding group – to further drill down in to the statistical levels).

For large datasets, there can also be graphs with scroll bars as shown below.  The Demographics by Country statistics would be very suited to a scrollbar column graph as the list of countries is quite long.

This re-conceptualising of datasets would need some dedicated involvement from the Statistics and Analysis Team, to decide how best to present the layers of data in a visual and interactive format.  The datasets for the layers would then need to be converted to XML files.

The format of the data – getting it to XML

This will require some workload particularly for complicated datasets.  Excel is quite a locked up proprietary product, and does not like to share it’s formats very well.  There is no simple save as XML feature that would work, however, the Developer Toolbar provides some features for working with XML.

With the Developer Toolbar one can import an XML schema, which sets up the rules and conditions for the document’s structure.  It also allows the user to then map XML tags to areas in the spreadsheet, so that XML can be exported in the desired format.  After testing this it was found to be quite laborious and temperamental – at this point it seems like an unfeasible option.

Fusion Charts comes packaged with a tool to create XML files.  It allows the user to copy and paste from Excel, and then generates the XML.  It does a fairly effective job of it, with some extra typing to name the dataset labels it can be fairly quick to create the needed XML this way – but far from perfect.

The road of automation is the ultimate solution – but without having more technical/specialist skills in the Team this option does not seem likely at this point.

Managing updates

The Stats Team are concerned that by implementing this it will take too much workload to maintain.  It is essential that an easy set up and maintenance procedure be created but at the end of the day, it is unavoidable that some additional work will be required.

Adding simple datasets to the Stats website would be fairly straight forward and quick, taking roughly 30 minutes of someone’s time at the most.  It would involve:

  1. creating the XML file with the Fusion Charts XML Generator
  2. typing in a few extra settings and saving the XML file
  3. copying that file to the server
  4. editing the website’s html files to include a link to the new graph

Depending on the approach taken there may also be a HTML file that needs uploading.  For the complicated datasets, it is unclear how much time it would take to maintain and upload the statistics to the website.  It could take a day, possibly more.  Initial set up of all of the current statistics will be moderately time consuming, then once everything to date is online only yearly / semester updates would be needed.

Advertisements

Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: