# For the visiting Crikey Readers, you want this page.

#

# Stage 1. Gather data, perform basic exploratory data analysis.

So I obtained some raw data from the latest IPCC report, containing a number of estimates of solar, volcanic, and "all other" climate forcings, wrote a couple of perl scripts to get the data into a nice matrix format. Next I used a spreadsheet to make a grand mean for each forcing estimate which is a quick and dirty way of accounting for each data set.

Next I decided that I'd find some CO2 estimates which I did from direct atmospheric observations, as well as those inferred from ice cores. Finally I decided to grab the Southern Osscilation Index data from the australian Bureau of Meterology.

Now this is all in a handy CSV file. I loaded it into R with this code adapted from code I found at the R graph gallery

So here's a nice visualisation of this very basic analysis:

Recall that while a correlation does not imply causation, it does provide evidence. The higher the correlation is to 1 the closer it is to a perfect relationship. The closer to -1, the closer to a perfect inverse relationship. I'm concerned about the relationship between forcing and co2_mean. This means that they're essentially the same measurement, and I don't understand why. I'm also confused as to why there is a strong positive relationship between solar forcing and co2_mean.

Some interesting points from the graphs. Firstly, CO2 is the strongest association with temperature anomaly, followed by solar forcing, followed by volcanic. Secondly, see i you can work out the rather odd relationship between year and anomaly.

I guess the next step would be to do a stepwise (backwards and forwards for fairness) regression trying to predict tempreature anomaly from co2, volcanic, and solar forcing figures.

## Update

So apart from examining why the correlation between mean co2 and forcings is 1 (explained in the comments, the correlation isn't 1, it's 0.99 rounded up), we also wanted to look at the log realtionship of CO2 to forcing, so we updated the old R code to put together the correlation matrix/scatterplot to add the co2.log.diff variable (and remove a couple of boring ones). Here's the result:

No massive difference. The strong relationship between the log difference and anomaly is still there (higher this is an artefact of the log-linear relationship being inappropriate for the linear correlation coefficient), and also the forcing v co2.log.diff correlation is lower, which is a good thing. So I think we're ready to do some stepwise linear regression on the data. We'll use anomaly as the dependent variable, and we'll use a couple different approaches to stepwise linear regression to interrogate the data. That's all coming up in Stage-2

Remember all the data and source code for this analysis is available from http://github.com/singingfish/Climate-Karaoke/tree/master, and the statistical software used for the analysis is the free open source software R, which is also the professional statistician's tool of choice available from http://www.r-project.org.

## Comments (6)

## Kieren Diment said

at 12:23 am on Jul 23, 2009

Reprinted from http://blogs.crikey.com.au/rooted/2009/04/08/climate-change-cage-match-a-fight-to-the-death/#comment-1045

Numbered to make a latter response easier.

1. Interesting analysis kdkd. Like your matrix table. Did you use any of the noaa data I referred?

2. Couple of suggestions. Perfect correlation of forcings and CO2 mean looks suspect.

3. Can you try ln(CO2a/CO2b) in place of CO2 mean, where CO2b is about 280ppmv in 1750AD? This is supposed to be linear with CO2 forcing. Myrhe -1998 eqan:

Delta F(CO2) = 5.35 ln(Co2a/CO2b) where Delta F is increase in forcing (W/sq.m)

You would probably need to enter a decadal value for this and interpolate the yearly values eg some examples:

900 - 1750AD = Zero

The 1900AD value would be 5.35 ln(290/280) = 0.188 W/sq.m

The 1950AD value would be 5.35 ln(310/280) = 0.545 W/sq.m

The 2005AD value would be 5.35 ln(380/280) = 1.634 W/sq.m

3. Interesting to see the strong correlation between solar forcing and CO2 mean. CO2 has historically been a lagging indicator but always following temperature anomaly closely (by about 700 years).

4. I wonder how this correlation changes with using the function ln(CO2a/CO2b) instead of CO2 mean?

5. SOI is clearly an internal effect and Volcanic a fairly weak correlation with temp anomaly

6. Fascinated to see if any of the ‘thousands’ of IPCC authors have done a similar analysis.

7. Leave Tamas alone while you are doing this kdkd. To be universal and Catholic you must include the UAH data set somewhere which will warm his heart and cool his head.

## Kieren Diment said

at 12:35 am on Jul 23, 2009

1. All of the variables in CAPITALS are aggregated means for that category in the IPCC figure.

2. Yes, not sure what's happening. It's either data entry error, or rounding error I think.

3. Good plan. My only reservation is a lot of zero readings will pollute any regression by introducing heteroscedacity (non constant variance which violates the assumptions underlying the linear regression technique)

4. You can demonstrate this with residuals analysis, which was on my list of things to do.

5. I'm also unconvinced that SOI measures the right thing, or if there's a more useful way of providing a smoothed mean or median by year, as inter-annual variability may mask any effect.

6. This is pretty undergraduate stuff, and is theoretically really naive about the climate system. I want someone with climate scientist credentials to comment on this so I understand the difference in my behavioural scientist/ecologist statistics compared to the physical sciences approach. I can't read physics books on statistics because I don't understand their aproach to terminology for example.

7. Every intention of using the UAH data at some point, when it becomes available.

## Kieren Diment said

at 5:07 pm on Jul 23, 2009

2. The 1.0 correlation between co2_mean and FORCINGS is resolved:

cor(co2_mean,FORCINGS,use="complete")

[1] 0.9983813

so co2 is accounting for 99.6% of the variance of all other tempearture forcings (i.e. 0.998^2). You'd have to return to the individual papers to see the methodology used to calculate the individual values for all other forcings.

Meanwhile co2 also accounts for 80% of the variance of anomaly (r=0.897).

I will look at logs in a new post later on.

## Kieren Diment said

at 3:37 pm on Jul 24, 2009

There are some graphs on this page (http://www.abc.net.au/unleashed/stories/s2632735.htm) that show the relationship between ENSO and temperature. I think that the problem with this data set is that SOI varies quite a lot month by month, and using year-by-year temperature anomaly means masks this data.

## Stubborn Mule said

at 5:21 pm on Aug 10, 2009

I don't yet understand any of the science behind this discussion (I've just been googling "forcing", for example), but there is a comment that springs to mind about the high correlation you are seeing between co2_mean and FORCINGS. Correlation is often not a particularly useful measure when you are dealing with non-stationary time series. If you have two time series which both exhibit a positive drift, they will have a high correlation. A glance at the panels against year for each of them highlights the non-stationarity of both series, and this is readily confirmed by looking at their acfs in R. To suggest something better, I would need to continue to improve my understanding of what all these variables are. As an example, though, when looking at price time series in finance (which are non-stationary) you would look at correlations of price changes not the original series. The correlation between yearly changes in FORCINGS and co2_mean is only 46%, but that may not be the appropriate thing to consider.

## Kieren Diment said

at 5:35 pm on Aug 10, 2009

Aye, so we predict anomaly from co2 and solar in the end. I'm quite confused by the solar irradiance / solar forcing link myself, and I suspect using it de-emphasises the role of co2, as it's clear that something like the capacity of the atmosphere to store heat from the sun is dependent on the concentration of greenhouse gasses in the atmosphere.

You don't have permission to comment on this page.