Stage 1


For the visiting Crikey Readers, you want this page.

 

Stage 1.  Gather data, perform basic exploratory data analysis.

 

So I obtained some raw data from the latest IPCC report, containing a number of estimates of solar, volcanic, and "all other"  climate forcings, wrote a couple of perl scripts to get the data into a nice matrix format.  Next I used a spreadsheet to make a grand mean for each forcing estimate which is a quick and dirty way of accounting for each data set.  

 

Next I decided that I'd find some CO2 estimates which I did from direct atmospheric observations, as well as those inferred from ice cores.  Finally I decided to grab the Southern Osscilation Index data from the australian Bureau of Meterology.

 

Now this is all in a handy CSV file.  I loaded it into R with this code adapted from code I found at the R graph gallery

 

So here's a nice visualisation of this very basic analysis:

 

 

Recall that while a correlation does not imply causation, it does provide evidence.  The higher the correlation is to 1 the closer it is to a perfect relationship.  The closer to -1, the closer to a perfect inverse relationship.  I'm concerned about the relationship between forcing and co2_mean.  This means that they're essentially the same measurement, and I don't understand why.  I'm also confused as to why there is a strong positive relationship between solar forcing and co2_mean.

 

Some interesting points from the graphs.  Firstly, CO2 is the strongest association with temperature anomaly, followed by solar forcing, followed by volcanic.  Secondly, see i you can work out the rather odd relationship between year and anomaly.

 

I guess the next step would be to do a stepwise (backwards and forwards for fairness) regression trying to predict tempreature anomaly from co2, volcanic, and solar forcing figures.

 

Update

 

So apart from examining why the correlation between mean co2 and forcings is  1 (explained in the comments, the correlation isn't 1, it's 0.99 rounded up), we also wanted to look at the log realtionship of CO2 to forcing, so we updated the old R code to put together the correlation matrix/scatterplot to add the co2.log.diff variable (and remove a couple of boring ones).  Here's the result:

 

No massive difference.  The strong relationship between the log difference and anomaly is still there (higher this is an artefact of the log-linear relationship being inappropriate for the linear correlation coefficient), and also the forcing v co2.log.diff correlation is lower, which is a good thing.  So I think we're ready to do some stepwise linear regression on the data.  We'll use anomaly as the dependent variable, and we'll use a couple different approaches to stepwise linear regression to interrogate the data.  That's all coming up in Stage-2

 

Remember all the data and source code for this analysis is available from http://github.com/singingfish/Climate-Karaoke/tree/master, and the statistical software used for the analysis is the free open source software R, which is also the professional statistician's tool of choice available from http://www.r-project.org.