Global Warming Vs Clojure!

2010-01-27 20:00:39

Nobody who's connected to the rest of the world, either via TV or the Internet is unaware of Global Warming - This phenomenon which threatens to destroy us all if we don't collectively assume responsibility for the globe. Here's my contribution to a solution in 98 lines of heavy computational Clojure!



Preface

As a Danish Citizen I feel its mandatory to engage in this debate. Denmark recently hosted a major conference to facilitate solutions to the climate threats of this the 21.th century. The result as you all probably know what unfortunately a huge failure, so tons of CO2 was needlessly emitted by flying in all the foreign officials, security motorcades etc etc.

I plan on getting a better result. I've learned that the National Oceanic and Atmospheric Administration (NOAA) have published about 3 Gigabytes of tarballed weather data, going back as far as 1929. My mission is now to organize and parse that data, to see exactly what the effect of our recent boom in CO2 emission is doing to the environment. From the total effect I'll be able to approximate Clojures contribution to the Global Warming.


Why should I read this post?

Well if you should read it, it's because one of more of the following applies:


Data

First we have to get the data from NOAA. For the sake of all of you who want to repeat these calculations to show the neighborhood kids that they shouldn't spend their spare time lighting dumpers on fire and generally wasting energy, I'll go through every step:

#1: Preparing URLS

Every dataset is found on their ftp in a subdirectory named according to the year the data is from and in that directory there's a tar-ball called gsod_year.tar. I'm not a wget expert so we will tag-team with Clojure:

(spit "urls"
   (apply str
        (map #(format "ftp://ftp.ncdc.noaa.gov/pub/data/gsod/%d/gsod_%d.tar\n" % %)
              (range 1929 2010))))

--- OR ---

(->> (range 1929 2010)
	   (map #(format "ftp://ftp.ncdc.noaa.gov/pub/data/gsod/%d/gsod_%d.tar\n" % %))
	   (apply str)
	   (spit "urls"))

(spit is from clojure.contrib.duck-streams)

I show both variants because you'll want to get comfortable with both -> (thread as the 2.nd item) and ->> (thread as the last item), they're here to stay and are quite handy. Either of those snippets produce a file called 'urls' which contains links to all the tar-balls available (except 2010 which ofc isn't complete yet), totalling about 3 Gigabytes.

To download all the data issue this command from the terminal

 $ mkdir dataset && cd dataset
 $ wget -i ../urls
--2010-01-20 17:18:53--  ftp://ftp.ncdc.noaa.gov/pub/data/gsod/1929/gsod_1929.tar
           => `gsod_1929.tar'
Resolving ftp.ncdc.noaa.gov... 205.167.25.101
Connecting to ftp.ncdc.noaa.gov|205.167.25.101|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD /pub/data/gsod/1929 ... done.
==> SIZE gsod_1929.tar ... 71680
==> PASV ... done.    ==> RETR gsod_1929.tar ... done.
Length: 71680 (70K)

100%[=====================================>] 71.680      94,5K/s   in 0,7s

The first file is just 70K as you can see, but the last is 90M - they grow as the number of weather stations increase. Now you've got all of your tars sitting in the same directory, so if you wanted to extract the data, do this:

 $ for i in *.tar; do echo $i && tar -xf $i && rm $i; done
(..wait about 20 minutes..)

That would give you about 450.000 number of gzips weighing in at a heavy 10 Gigabytes. If you unroll these files they'll expand into about 50 Gigabytes of raw text! Expanding that is too ambitious on my little laptop so lets think this through. When we've processed all the weather data we need it sorted by year (and perhaps month). To do a running sort of the data we have to carry the head which is close to impossible in this case - even 'ls' struggles with the 450k files so Javas Heap will too. The best possible way to attack this problem would be to write tar-ball reader, which lets us peek at the GZips and then a GZip reader which lets us work with the text without any pre-unpacking. Javas IO is centered around Input/Output streams, so if we can expose the data via streams it will be presorted and I don't have to overload my harddrive.

Processing

We're looking down the barrel of several Gigabytes of raw text data, so we need to set up some kind of headless processing. When I speak about holding/losing the head, I'm referring to how we handle memory. If you keep the head of the sequence while processing, that entire sequence is accumulated in memory. Not holding the head means only keeps an item/chunk in memory at every given moment.

First we have to narrow the field as much as possible so lets look at how the professionals do:

Hockey Stick
Hockey Stick - (C) Al Gore

As we can clearly see the temperatures in the Northen Hemisphere explode near start -> middle of the 1900's. So we'll follow Mr. Gores lead and filter out the stations which record temperature in the Northen Hemisphere (NH) - hopefully by the end of this blogpost we can reproduce the Hockey Stick using Clojure.

To do that, download the history file from NOAA's website, it contains the IDs of all weather stations as well as their position by longitude and latitude. The NH is defined by the longtitude coordinates that are positive. The file starts with 20 lines of information and then many lines like these:

010014 99999 SOERSTOKKEN                   NO NO    ENSO  +59783 +005350 +00490
010015 99999 BRINGELAND                    NO NO    ENBL  +61383 +005867 +03270
010016 99999 RORVIK/RYUM                   NO NO          +64850 +011233 +00140
010017 99999 FRIGG                         NO NO    ENFR  +59933 +002417 +00480
010020 99999 VERLEGENHUKEN                 NO SV          +80050 +016250 +00080
010030 99999 HORNSUND                      NO SV          +77000 +015500 +00120
010040 99999 NY-ALESUND II                 NO SV    ENAS  +78917 +011933 +00080
010050 99999 ISFJORD RADIO                 NO NO    ENIS  +78067 +013633 +00050

The predefined structure of the file is based on indices, meaning I know that the WBAN ID always starts at IDX 8 and ends on 12 (7-11 when zero-based). We cannot split this on words (\w+) because some elements are occasionally left out but the index structure remains. Thats means that we can filter out those stations that are in the Northern hemisphere like so:

(defn northern-stations [filename]
  (let [data   (->> (line-seq (reader filename)) (drop 20))
        north  (for [station data :when (= \+ (nth station 58))]
                 (vec (take 2 (.split station " "))))]
    (reduce #(assoc %1  :stn  (conj (:stn %1) (%2 0))
                        :wban (conj (:wban %1) (%2 1)))
            {} north)))

If you're a regular here that should be very straight forward, but in case you're not:

  1. Start a line-reader, skip 20 lines
  2. Walk the lines and :when the 58.th index is "+" take out the first 2 elements STN and WBAN and bundle them in a vector
  3. Reduce that series of vectors into a hashmap ala {:stn [id1 id2 id3] :wban [id1 id2 id3]}

The reason this doesn't break is that neither STN or WBAN are left out anywhere in the data, NOAA consistently uses 99999 to indicate that a field is blank. With all the valid stations extracted we can now set up a data-parser which only extracts data from these stations.

Gunzip

We can start by unpacking the 1929 tarball manually to inspect the data. Java.util.zip has a few classes for working with GZip archives, so lets try them out and see if we can't integrate it nicely.

My personal preference would be to write some kind of wrapper, which would enable me to work on the data like so

(with-zipstream [stream "/path/to/gzip.gz"]
   (...work on data...))

Well as you know its easy to conform a Lisp to your liking, so here's my personal Gunship:

(defmacro with-zipstream [bindings & body]
  `(with-open [~(bindings 0) (->> (FileInputStream. ~(bindings 1))
                                  GZIPInputStream. InputStreamReader.
                                  BufferedReader.)]
     (do ~@body)))

But although its tempting to abstract away using macros, this constitutes macro-abuse, since this is equally helpful:

(defn dump-stream [stream sz]
  (let [buffer    (make-array Byte/TYPE sz)]
    (.read stream buffer 0 sz)
    (ByteArrayInputStream. buffer)))

(defn line-stream
  [tarstream tarentry]
  (with-open [zipfile (->> (dump-stream tarstream (.getSize tarentry))
                           GZIPInputStream. InputStreamReader. BufferedReader.)]
    (doall (for [line (repeatedly #(.readLine zipfile)) :while line] line))))

This does very much the same as cores line-seq, in that it returns a non-lazy (doall) sequence of all the lines in the GZip.

Now you have access to the weather data which looks like so:

STN--- WBAN   YEARMODA    TEMP       DEWP      SLP        STP       VISIB
030050 99999  19291001    45.3  4    40.0  4  1001.6  4  9999.9  0   17.1
030050 99999  19291002    49.5  4    45.2  4   977.6  4  9999.9  0    9.3
030050 99999  19291003    49.0  4    41.7  4   975.7  4  9999.9  0   10.9
030050 99999  19291004    45.7  4    38.5  4   992.0  4  9999.9  0    6.2
030050 99999  19291005    46.5  4    41.5  4   997.8  4  9999.9  0    7.8
030050 99999  19291006    49.5  4    46.5  4   990.1  4  9999.9  0    7.8
030050 99999  19291007    48.2  4    44.8  4   979.1  4  9999.9  0    9.3
030050 99999  19291008    46.5  4    39.2  4   994.3  4  9999.9  0   12.4
030050 99999  19291009    44.7  4    40.0  4  1005.4  4  9999.9  0   10.9
030050 99999  19291010    48.7  4    47.0  4  1000.6  4  9999.9  0    8.4
030050 99999  19291011    48.7  4    39.2  4   995.5  4  9999.9  0   12.4

With direct access to the text compressed away in the GZips its easier to implement a tarball reader, because we can see the result of our experiments immediately - Just printing the binary values from each entry (file) in the tarball wouldn't tell me much about my success of failure in uncompressing the data.

Tar

To work with tarballs I've downloaded this jar file. You can also visit the Javadocs: here. First things first: Lets see if we can peek at the data, starting with the smallest set 1929.

As always when we're in Java-land we have to think carefully about how we want to abstract methods and workflows. The class TarInputStream lets me view a Tarball as series of TarEntries (compressed files).

With the ability to walk a zipstream as pure text already in place, we can now wrap the Tarball in a process-tarball function. Its handy to go through the data one tarball at a time, both for sorting as I mentioned before but also for getting both cores busy with the data. To give you an idea of what I'm thinking, here's the top-level processing:

(defn process-tarball
  [filename stn-ids wban-ids headers]
  (println "Parsing: " (.getName filename))
  (flush)
  (let [tarstream    (->> filename FileInputStream. TarInputStream.)
        readings     (extract-readings tarstream stn-ids wban-ids)]
    {:year (re-find #"\d{4}" (.getName filename))
     :mean (if-let [cnt (count readings)]
             (when-not (zero? cnt)
               (/ (reduce + readings) cnt))))}))

If you imagine that extract-readings just does something like walk through all readings and return a map of the temperatures it picks up, then this will collect all readings from a jar and return a hash-map containing the year parsed and the mean temp. for that year- Quite neat for about 20 lines of Clojure.

With the ability to peek through the tar into the GZip and through the compression into the text, we can start writing extract-readings. The easiest thing to do, would be to have a for-loop run through every line of the GZips and merge those line into 1 huge sequence, giving is a line-seq of all the GZips. The problem however, is that Java spends 2 bytes per char in a String. If we apply the math to the set from 2002. It goes like this

77 Mb Tarball => 10.000 Gzips weighing 360 Mb => Strings worth at least 3.6 Gb

That means for a weak system as mine, I'm down from the count once we reach 1950. So that deprives me of the luxury of picking columns out at the higest level of the parser, meaning my extract-readings function needs to be more specific than just pulling out 1 line at a time:

(defn extract-readings
  [tarstream stn-ids wban-ids]
  (->> (Double. (nth (cols data) 3))
       (for [data (rest (line-stream tarstream file))])
       (for [file (repeatedly #(.getNextEntry tarstream))
             :while file
             :when (let [[_ stn wban] (re-find #"(\d+)-(\d+)" (.getName file))]
                     (and (not (.isDirectory file))
                          (or (stn-ids stn) (wban-ids wban))))])
       flatten))

If you're new to ->> thats probably not easy to read. First I know that I want to work on every line of each file, so I pull out the 4.th column from calling 'cols' on that line. Cols just splits on spaces. And then I cast that to a Double. That expression is then fed to the first for loop which runs through all of the lines in the file where file is the result of the final for loop. The final for loops runs through all the entries in the tarball, picking out those entries which are not directories and are in the valid stations list, ie. stn-ids & wban-ids respectively.

Currently we're running about 30 lines of Clojure and we've already got our main data extraction function set up. On the JVM we have several options when wanting to process data concurrently: Agents that carry state in asynchronized processes, futures that are just threads, promises and more. For this job there were two ways which seem appealing: Either LinkedBlockingQueue or a Parallelized map. I opted for #2. Pmap walks the data using a 'sliding window' approach, meaning if a thread is falling too far behind pmap will wait - This is good because of the heavy memory load inherent to this challenge. With LinkedBlockingQueue you have to decide on a number of workers but with pmap you just have to think in chunks - ie. how big do I want them? For this job its a no-brainer: A chunk = A tarball.


Ready to Launch

Now with the above functions implemented you have very free hands to decide how you want to attack the data, show/save the output etc etc, here's one way to go:

(defn process-weather-data
  [dataset history-file]
  (let [stations   (northern-stations history-file)
        stn-ids    (disj (set (:stn stations))  "999999")
        wban-ids   (disj (set (:wban stations)) "999999")
        dataset    (->> (File. dataset) file-seq (filter #(.isFile %)) sort)
        headers    [:stn :wban :yearmoda :temp]
        result     (->> dataset
                        (pmap #(process-tarball % stn-ids wban-ids headers))
                        doall)]
    (spit "result" (sort-by :year result))))

(my runtime: 1 hour 20 minutes)

First we pull out the stations on the Northern Hemisphere and break that down into 2 sets (for fast comparisons). Then from both of those I disjoin "999999" which per convention is an empty field (see the README on NOAA). Then I take the path 'dataset' and mangle it into a sequence of the files, sorted by ascending order. The sort is nice because 1) It allows me to track progress, 2) The 2 largest data-sets will only have 2 threads running instead of 4 for the majority of the time. Then I manually define some headers, it wouldn't be hard to rip them from a tarball, but there's no need. Finally I start off the process calling pmap which launches 4 threads.

To avoid my system buckling (and spending excessive time) and boot a lean Arch installation, which claims less than 100Mb of RAM for itself - After 5 minutes this was the scenario:

[caption id="" align="aligncenter" width="550" caption="5 minutes in"]5 minutes in[/caption]

Couldn't be better, both cores are boiling and the memory isn't headed toward a heap explosion. After 45 minutes, this was the situation:

[caption id="" align="aligncenter" width="550" caption="45 minutes in"]45 minutes in[/caption]


The memory consumption has stabilized at 79.2% and both cores are still going full speed - excellent!


Results - Round 1

With the data 'spit' directly into a result file, its easy read it back and mangle it any way we want. For instance:

(doseq [{:keys [year mean]} (read-string (slurp "result"))]
             (println (format "%s\t%s" year (if mean
                                                (str (/ (* (- mean 32) 5) 9) )
                                                "null"))))

That will give you 2 columns of the readings converted to celcius, which you can copy/paste into your favorite Spreadsheet editor and observe the following:

[caption id="" align="aligncenter" width="601" caption="Temperature Graph #1"]Temperature Graph #1[/caption]


And now perhaps you're wondering, where's the Hockey Stick ? Well one explanation could be, that we're actually not doing a very good job at picking our input data, as we have simply parsed all available data. Because we see an explosion in the number of weather stations in the years 1995 - 2009 it's a fair assumption that if these are unevenly distributed then because of their great number they distort the graph quite a bit.


A closer look

So we need to be more picky about the stations we use to avoid distorted data. Lets try to extract all the stations used in 1929 (our first recorded year) and follow their readings throughout the following years - That will give us a clear indication of the variations in global temperatures without any weight distortion. To accomplish this, we can help ourselves by making a function which extracts all station IDs from a given Tarball:

(defn get-stations [filename]
  (let [tarstream    (->> filename FileInputStream. TarInputStream.)
        all-stations (for [file (repeatedly #(.getNextEntry tarstream))
                           :while file
                           :when (not (.isDirectory file))]
                       (let [[_ stn wban] (re-find #"(\d+)-(\d+)-" (.getName file))]
                         {:stn  stn :wban wban}))]
    {:stn  (disj (set (map :stn all-stations)) "99999")
     :wban (disj (set (map :wban all-stations)) "99999")}))

climate> (get-stations "../dataset/gsod_1929.tar")
{:stn #{"037950" "033110" "038940" "034970"
"039800" "033960" "032620" "030750" "030910" "038040" "038560"
"038110" "990061""037770" "036010""039530" "038640""031590" "033790"
"030050" "039730"},
:wban #{}}

So all we need to do in order to follow these stations, is filter out those in the northern hemisphere and re-run the job. For the sake of faster experiments while we wash our data, I'll accept the station-ids + outputfilename as arguments:

(defn process-weather-data
  [dataset history-file stations output]
  (let [dataset   (->> (File. dataset) file-seq (filter #(.isFile %)) sort)
        nstations (northern-stations history-file)
        stn-ids   (set (filter #((set (:stn  nstations)) %) (:stn  stations)))
        wban-ids  (set (filter #((set (:wban nstations)) %) (:wban stations)))
        headers   [:stn :wban :yearmoda :temp]
        result    (doall
                   (pmap #(process-tarball % stn-ids wban-ids) dataset))]
    (spit (str output ".raw") (with-out-str (prn result)))
    (println "Done")))

(let [tracked-stations  (get-stations "res/dataset/gsod_1929.tar")]
  (process-weather-data "res/dataset/" "res/history"
                        tracked-stations "stats-1929"))

(my runtime: 11 minutes)

Now we don't have as much data as before (although still a lot), but we know that its not distorted by the addition a huge number of stations in various locations. That gives us the following graph:

[caption id="" align="aligncenter" width="600" caption="Temperature Graph #2 - Tracking 14 stations"]Temperature Graph #2[/caption]


Hmm... I'm really starting to wonder how that Hockey Stick was produced because as we can clearly see from following about 14 stations is a gradual decrease in Global Temperature. But lets give Mr. Gore the benefit of the doubt and expand our scope to include all stations used from 1929 - 1940 and then follow those throughout the years. That will give us a ton of data and keep us somewhat safe from weight distortion. I'll introduce a helper to compile all the stations from a given range:

(defn get-station-series [base start end]
  (apply merge-with into
         (for [i (range start (inc end))]
           (get-stations
            (str base (if (not= \/ (last base)) "/") "gsod_" i ".tar")))))

Call that with your directory containing the tars as a base and then 2 integers (1929 1940) and you'll get in return a 2 sets containing all stations used in the period. Then all you need to do is start the job with those stations located in the NH filtered:

(let [tracked-stations  (get-stations-series "res/dataset" 1929 1940)]
  (process-weather-data "res/dataset/" "res/history"
                        tracked-stations"stats-1929-1940"))

(my runtime: 10 minutes)

Let that run for a while and you'll get a lot data resulting in the following graph:

[caption id="" align="aligncenter" width="603" caption="Temperature Graph #3 - Tracking 450 stations"]Temperature Graph #3[/caption]


It seems that when we leave out the great number of weather stations that were introduced in the last 50 years or so, that the tendency is absolutely not a rise in temperature.


Final crack at the Hockey Stick

Ok - If at first you don't succeed, try harder. The reason the data is distorted is because the stations aren't been weighted correctly. Some areas have a higher density of stations, some stations report more frequently than other - Long story short: We need to visually get an impression of each stations readings. By looking at each station indiviually instead of compressing them to an unevenly weighted average, we will be able to clearly deduce how the weather has changed through the years recorded.

To make this really easy, I'll introduce a helper which spits out files ready for OpenOffice Spreadsheet:

(defn emit-dataset [data]
  (let [uids (distinct (flatten (map #(map :uid %) (map :reads data))))]
    (with-out-str
      (doseq [{:keys [year reads]} data]
        (print year)
        (doseq [uid uids]
          (if-let [reading (first (filter #(= uid (:uid %)) reads))]
            (print "," (:mean reading))
            (print ",null")))
        (println "")))))

(spit output (emit-dataset result))

As you can see, this little helper just outputs a CSV file, but it does so calling out the :uid on each reading. To get that bit of info I changed the reader, but I won't go through it here, if you're interested you can read the source on Github. The last will dump the CSV in a file, so you can place that at the bottom of your main func.

The 1929 stations:

[caption id="" align="aligncenter" width="550" caption="14 stations from 1929 seen individually"]14 stations from 1929[/caption]

Despite the fact that some years are w/o readings, its clear that the general tendency is in direct contradiction with what Mr. Gore has shown - His graph resulting in a massive warmt increase throughout the 1900's, meaning that the entire graph above should be rising quite drastically - The inconvenient truth seems to be however, that there is no significant rise in temperature.

[caption id="" align="aligncenter" width="550" caption="100+ stations from 1929-33 seen individually"]100+ stations from 1929-33[/caption]


Its hard to get a good grasp of the data when visualized like this, but reviewing over 100 stations we see that a few areas are seeing a rise in temperature while most aren't. I did a small hack 'n' slash sparkline viewer to try and get a feel for the larger sets coming up on 500 stations and they seem to show the same tendency.


Hockey Game is Over

I've worked the official numbers from NOAA and now you're all able to re-run the computations at home. Its clear from the official data that the globe isn't wildly heating up, in fact in some places it is now cooling down. I'm always a bit skeptic when politicians try to solve the worlds problems using taxes and indeed the proposed Carbon Tax is a dangerous idea. Whenever a human being exhales they emit CO2 so the Carbon Tax is in effect a tax on life. Activists, Greepeace, Lobbyists etc are willing to die, but for what, humanity? No, for knowing neither Math nor Clojure.

Code here: Github

Mark Watson
2010-01-27 23:22:19
Thanks, that was a great article: both interesting and useful for picking up Clojure data manipulation techniques.
Jacques Mattheij
2010-01-27 23:36:03
Nice work, now you need to do it all over with *all* the data. 

If you stretch a graph horizontally enough and you only use the last little bit of it a hockeystick will always look like a flat line. 

The historical records, which you've omitted contain the base from which the hockey stick projects at the turn of the 19th century.

So, my guess is if you run it again without pruning the historical data you'll get the hockey stick back. The only way to find out is to try I guess, it would be nice to see a follow up to this post.
Mike
2010-01-28 01:25:15
Take a look at http://data.giss.nasa.gov/gistemp/
You can generate a map of temperature anomalies that shows that there are areas in the Northern hemisphere that have cooled relative to the baseline period. But globally, warming is occurring. Also, see http://www.columbia.edu/~jeh1/mailings/2010/20100115_Temperature2009.pdf for lots of interesting information on the GISS analysis.
alex not albert
2010-01-28 06:14:17
Quite interesting w/r/t data processing and clojure.

A quick point on the climate data:
You are graphing temperature data on an absolute axis, unlike the accepted hockey stick graph, which is in deg. C deviation from the 40 year mean.  In your graphs, a 0.5 deg C variation is almost invisible, while in the climate system, a 0.5 deg C or 1 deg C change is highly significant.
PaulM
2010-01-28 06:17:19
Nice work. 
re: Hockey stick has already been ripped apart by others, most of the charts produced to scare the public on GW are on the back of some serious data massaging...

google: climategate
Lau
2010-01-28 09:29:05
@Mark - Thanks :)

@Jacques: The fact that I have omitted the 1000 - 1929 range is not only because of the dataset I was using, but also I gather from many online discussions that the data from the vast timeperiod is almost useless as no consistent way of determining the temperatures has been found. If the Hockey Stick was to forewarn disaster it would have shown some bumps relative to the great expansion of industry, happening mostly around 1920 and 1945+ - The graph show no significant climate change in these periods. To me that means that there is no connection between CO2 emission and Weather change - But there is change going on all the time - Occasionally a year drops wildly in temperature compared to the year before, yet I don't count that as an impending ice-age.

@Alex, Mike: See my reply to Jacques.

@PaulM: Thanks :)
elswith
2010-01-28 09:40:30
Interesting article, but unfortunately you jump to what sounds like a preconceived conclusion. It would be great to see a followup if you take #3 and #4 seriously. I would love to see AGW debunked but at this point I would put my money on the most heavily peer reviewed and scrutinised scientific document in history than a first analysis of raw data.
elswith
2010-01-28 09:46:31
@Lau: it would be interesting to see the clojure code to plot the deviation from the 40 year mean as mentioned in #4
Lau
2010-01-28 09:47:39
@elswith: You're right in that this article constitutes a first analysis of raw data and certainly there's something to be gained from putting in more work. The main difference between me and the Climate Scientists is that they get paid to work the numbers, I did it for the fun of heavy computations. That has 2 natural consequences, 1) I don't gain anything, regardless of the result of the analysis. 2) My time to persue this is somewhat limited.
elswith
2010-01-28 10:21:30
@Lau: Agreed. I just think you reached your conclusion a bit to hastily.
Cyrus Hall
2010-01-28 12:23:54
I would suggest anyone who is tempted to believe this completely non-scientific analysis to head over to http://www.realclimate.org/ and read how the hockey stick was actually created.  It was not from unbiased NOAA temperature readings, although that dataset was used.  It's a combination of datasets, most of them far from raw data.

Second, the form of weak denialism you seem to practice (as indicated by your last comments) is completely unbelievable.  I would guess you have no experience with science as practiced.  It is highly competitive, and if there is one thing scientists enjoy, it's proving another one wrong.  The concept that climate scientists are just in it for the money, and that there is some sort of global conspiracy where they all lie, is just not a possibility.
Lau
2010-01-28 13:05:30
@Cyrus: Believe what? The numbers? If this article does anything its enable the readers to run their own analysis.

It's difficult to relate to specially crafted datasets, that use approximations like tree-rings and ice-holes etc. What's not so difficult is looking at a number of weather stations and seeing them read a consistent harmonic weather situation. If you feel happier doing obscure mathematical tricks to produce a hockey stick, thats okay with me - But if the 'global warming' doesn't result in higher measurable temperatures as seen on thermometers - don't expect me to take your science seriously :)
Ryan
2010-01-28 14:45:23
Great job!

This is true, open science.
Kim Andersen
2010-01-28 16:14:14
I’ve been following the climate debate (I would say complete lack of it) though the years. I am currently looking at the released data from NOAA, and I can’t find the hockey stick in the data. I can’t find it at all. It does not surprise me, because I’ve seen how Michael Mann produced his graph. Not even with tree-rings or ice-cores does it fit up (see http://wattsupwiththat.com/2009/09/27/quote-of-the-week-20-ding-dong-the-stick-is-dead/). It was a fraud in the same way that IPCC recently confessed that their statement regarding the melting of Himalayan glaciers was an error (read fraud attempt). Yet, I still hear the lie in the Danish radio, that the glaciers are melting faster than ever before. It’s madness! The guys at IPCC are clinging to their generally faulty climate claims - just to get the next meal at their family’s table. 

I don’t know why people keep calling all open fact (as the plain data from NOAA) for unscientific. Gores movie has nine errors confirmed in the high court, and additional 27 that includes the hockey stick. But I’ve learned that it doesn’t take science or fact to satisfy or convince people. It takes faith. I believe that 31,486 scientists including 9,029 with PhDs outnumber IPCC’s journalists, gynecologists and other unscientific members (bloodsuckers) of the claimed 2,500 IPCC scientists.

Since 1915 we’ve had white christmas eight times in Denmark. 1915, 1923, 1938, 1956, 1969, 1981, 1995 and then again this year 2009. Do the math guys. If it really where warming I believe it would be visible in the interval too. But nothing there either. All nature calls to anyone who will listen. Together with all other data, including NOAA’s recordings, it tells that the climate is as has always been. How arrogant is it to think that we might human can destroy the climate using CO2? CO2 is and has always been a fundamental gas supporting life.

Thanks Lau for taking the time to actually show us the released data.
Kim Andersen
2010-01-28 16:16:53
Sorry, I forgot to insert this link to the 31,486 scientifically skeptic scientists: http://www.petitionproject.org/
Cyrus Hall
2010-01-28 19:19:20
@Lau - Statistics is not "obscure mathematics."  Well, it is, but not in the sense you mean.  I am not a climate scientist (I'm a computer scientist), so I neither claim to fully understand or to be able to reproduce the results of climate scientists.  But from my experience with data collection and analysis, it seems perfectly reasonable to me that post-collection cleaning and biasing is necessary.  Instrument readings are rarely used raw.

You're article, while quite interesting for the clojure code, does not allow people to "run their own analysis."  It does allow them to *fool* themselves into thinking they've done analysis.  I do like the code however. :-)

@Kim - You or I could sign that petition as well as the next person.  Neither of us have the expertise to do so.  The largest group of signatories are people with a Bachelors degrees in ... well, we don't know what in, but probably not climate science.  As a PhD, I could sign and that it would look impressive, but I have no expertise in the subject, and my signature would be a fraud in the context they present the petition.  I also find it interesting that you link to a page claiming that temperatures have been inflated by human heat emission on a post that claims to show, from those very temperatures, that there has been no increase.

@everyone - Ask yourself if really think scientists are so dumb as to take the following course of action: 1) make fake data 2) publish fake data in journals 3) release the real data that disproves their fake data.  Really.  That seems to be the logical claim here.  The accusation is that *an entire scientific field* is purposefully committing fraud.  There is no evidence for that.  None.  Individual cases of fraud happen in science, but they do not bring well-developed scientific fields with them.
Tomcpp
2010-01-28 19:49:45
"@everyone - Ask yourself if really think scientists are so dumb as to take the following course of action: 1) make fake data 2) publish fake data in journals 3) release the real data that disproves their fake data."

First 3) is done by different people than 1) or 2). But even if it wasn't. Have you ever been to any university department in exact sciences ? The answer to your qustion is simple - Yes they are. With the additional qualification that it's not at all stupid to do so.

If you understand how these departments work, how they arrive at their conclusions, this is beyond obvious. It also becomes obvious what these departments do and do not optimize. They do NOT optimize human knowledge, nor do they strive for ultimate correctness (decade-long forays into "probably incorrect" theories are not uncommon even for physics or even maths, never mind the softer sciences like climate science).

University professors have 2 functions, and they are paid mostly for the first :
1) educate youngsters (in whatever way they see fit)
2) produce research papers (or rather - have others produce research papers in their name)

Insofar as this overlaps with "increase human knowledge" they do so. When it conflicts ... well.

Research papers are not produced by university professors, who have safe, secure jobs ("tenure"). Those professors are much more akin to managers. Actual research is (sometimes exclusively, sometimes mostly) done by graduate students, or postgraduate students (read 5 papers by "the same" professor and this will be very obvious indeed).

These people do not have safe, secure jobs. They are, in fact, VERY dependant on the political game inside the department. In many cases the graduates, but especially the postgrads are much more knowledgeable than the professors, no matter the department or subject. They, not professors, are the speakers at conferences. But the professors do decide one very important thing : the politics.

They are holding all the keys (to getting published), they hold all the money (govt. money, vast majority of business sponsorship money), and if that somehow does not suffice for pushing their "mark" (ie. ideology) on the research, they have formal authority (they can kick people out, sometimes even fine them).

It is, of course, mostly the professors that are in the camp of climate alarmism, while postgrads almost universally have moderated opinions (they know how dangerous it is to venture outside of their direct knowledge, as they get fired for that)

Really man, "scientists" are not this holy institution that never pushes political viewpoints. At least visit 1 university campus, and get to know what the "political frustrations" of the student body is, before making remarks like that.
Kim Andersen
2010-01-28 19:57:08
@Cyrus Hall: At least the people at the Petition Project can say they have skills inside the scientific world. That’s still a world of difference from what IPCC is made of. The Petition Project requires of the signatories: “to have formal training in the analysis of information in physical science” (http://www.petitionproject.org/instructions_for_signing_petition.php) 

If the page that I referred to is claiming human caused temperature inflating, it does not concern the fake data collection that Gore’s hockey stick relies on. That’s what I’ve been linking to.

You say that people “*fool* themselves into thinking they’ve done analysis”. How would you use the data?
Cyrus Hall
2010-01-28 22:12:02
@Tomcpp -  I've been a university researcher going on eight years now, at several institutions.  I'm well versed in the internal politics at the departmental level, and moderately knowledgeable about the politics of publication process.  I believe I was once well screwed by the politics myself.  But while your criticisms hold some truth, they are a gross exaggeration on what actually happens.  Science politics do lead to poor publications (and conferences), and there are those who game the system.  But the majority of researchers try and get things right, and prioritize that.

As for professors knowing nothing: Wow. Sure, there are those that shouldn't have their positions, maybe 10% or so, but I find it very hard to believe you've ever been a graduate student if you truely believe that statement.  And yes, many professors do produce papers, all by their very self.  Some of the best.  I'd be happy to give some examples in CS if you want.

Do you have any evidence to back up your claims about post-docs in climate science?

@Kim - My point exactly.  I would guess both you and I qualify, as we've most likely both had training in the physical sciences.  How else can one explain all those veterinarians who have signed?
Giovanni Luca Ciampaglia
2010-01-28 23:11:32
"The fact that I have omitted the 1000 — 1929 range is not only because of the dataset I was using, but also I gather from many online discussions that the data from the vast timeperiod is almost useless as no consistent way of determining the temperatures has been found."

That's not the point. You're stretching the points of your data, so what you plot as a mildly increasing trend can easily be an abrupt increase, if plotted on the proper range. That the rest of the range has data in it or not is not important. 

Nice code anyway.

p.s. how come your second graph bends over itself before x=1940?
Lau
2010-01-28 23:41:03
@Giovanni: Absolutely right regarding the axis distortion - The point wasn't to do apples -> apples as I simply don't believe that you can correctly estimate the temperatures for those years, but merely to show the trends since we started actively observing temperatures using thermometers. Its also interesting to see on the Hockey graph, that the increase only starts after about 1929, ie when we started using thermometers - My assumption wouldn't be that its because of global warming, but rather that the temperature approximations from the preceding 1000 years are wildly incorrect.

Regarding the bend, its OpenOffice's way of showing that a year or a range is missing data.
Joseph Hirn
2010-01-28 23:44:30
Awesome article. I've learned so much Clojure by working through and trying to understand this line by line and I'm only halfway done. Expect 10 hits a day from me for at least a few more days now.  =)
Stu
2010-01-29 01:18:08
Still... cutting down on pollution and energy use is probably still a good idea - no need to acidify the oceans and kill everything in them ?
The more efficient we get, the more we can do with the resources were not wasting.
Kim Andersen
2010-01-29 08:32:17
@Cyrus Hall: My belief is that no one goes through the labor of creating a Petition Project with some invalid signatories, in hope of reaching the attention of the government. They have to present a strong case, while IPCC can say whatever they need to get the next meal. I admit that I don't know the signatories, or if there are any pet detectives. Anyway. It's widely known that IPCC is guilty of the exact same thing that you blame the Petition Project - and IPCC only represent 8% of the number that embrace the Petition Project. So with your approach, the Petition Project is overshadowing all argumentation with 92%.

You forgot to give me your answer regarding how you would use the data from NOAA? If we can't add up the numbers, how do we then apply the data?
Dmitri
2010-01-29 10:06:39
On temperature graph #3, I see strong peaks in 1940s and a drop in 1970s. This looks like city heat phenomenon. Where are the stations that you analyzed? I guess, 1940s it's because of war production peak, and 1970s were much colder and the drop because of the oil crisis.
Lau
2010-01-29 10:09:01
@Dmitri: I think its neither. The problem is that a multitude of stations are being added as the years progress. In the 1970s some scientists believed that an Ice Age was impending, so I imagine they set up many weather stations to get more data. In the article there's a link to the <em>History file</em>, which shows you were all the stations are located. Somebody write a mashup to show them all on Google Maps? :)
Cyrus Hall
2010-01-29 12:23:01
@Kim: I did answer.  I said I don't know what I would do with it, as I'm not versed in the problems with the data, nor the corrective measures that are taken.  Indeed, that's the whole point.  Blindly analyzing data without the knowledge to do so is a great way to fool oneself.  If I were to try and make a valuable contribution, the first thing I would do is figure out what the standard procedures are, recreate the existing data using them, and look for problems in the methods used.

@Lau: A very small number of scientists thought there was an impending ice age in the 70s.  http://www.youtube.com/watch?v=XB3S0fnOr0M
Kim Andersen
2010-01-29 15:24:33
@Cyrus Hall: I think you're trying to ignore your own intelligence. If you had a thermometer at home, and made a record of every day temperature through the years since 1929, how would you recognize a warming or cooling trend? It's quite simple. Add the numbers together for each day or month every year, and calculate the mean temperature. It's exactly the same for each station. None of the stations reaches global temperature, so they tell the local temperature. 

There is one factor though, we need to think about, that is El Niño warming (which we have now, and probably until northern spring time, http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_advisory/ensodisc.html) and La Niña cooling (1988-1989, 1995, 1999-2000, 2007-2008). Even though we have the warm El Niño this winter, the cold has struck many countries like in the good old days. It all follows what guys like our fellow countryman Henrik Svensmark has said. It’s getting colder.

Regarding your answer to Lau: A very small number of scientists today think there is an impending climate crisis - the claim still counts in the news. A large number say we don't need to worry, and they have to fight an often unscientific fight to be heard. Guess what. Greenland was green a thousand years ago - nobody was alarmed. Eric the Red and his companions planted 280 farms inclusive livestock, and they tilled the ground. They found palm trees up there too. I can’t imagine that the “Greenland” was caused by industrial CO2 emissions.  The weather is as have always been. Don’t be arrogant and fool yourself into believing that you can change that.
Cyrus Hall
2010-01-29 16:22:24
@Kim: I'm not ignoring my intellect; I'm using it to save myself from making false conclusions from overly reductionist logic.

I don't where you get your numbers of scientists beliefs, but it's not public polling:

http://www.eurekalert.org/pub_releases/2009-01/uoia-ssa011609.php

http://www.sciencemag.org/cgi/content/full/306/5702/1686

There's much more where that came from.

Also, this winter has been cold because there was a huge high pressure system sitting over the Arctic, which had the effect of warming the polar region while pushing cold air down to more temperate climates.  It had nothing to do with El Nino at all.  The storm that hit California last week was El Nino in action.  Both are regional effects, not global determinants.

The fact the Earth has temperature cycles is well accepted.  The difference is that those past cycles were natural (and often regional, such as the ME warm period).  Now we are dragging them upward.  The Greenland argument is an canard offered by skeptics who haven't thought that difference through.
Kim Andersen
2010-01-29 19:09:31
@Cyrus Hall: First of all, you have no more reason to believe that these 3,146 earth scientists are more qualified than the guys from the Petition Project. Maybe at least 3,146 would show up out of the 31,486 signatories. Second - do you know what the survey said? Read this http://errortheory.blogspot.com/2009/01/warming-propagandists-ask-most.html. Next - please don’t guide me to IPCC reports. I did catch the magic word “peer-reviewed”. That’s no guarantee for the value of the reports. If that’s what you want, then take a look at this: http://www.populartechnology.net/2009/10/peer-reviewed-papers-supporting.html 

If you’re right regarding El Niño, and you probably are, then there’s a factor less to draw into the temperature calculation...

It’s easy to claim history false. I guess the palm trees at the North Pole is optical illusion too? You are too biased, Cyrus, to acknowledge that the current weather is just as natural as the Medieval and Roman warming. We are using approximately 150 years of recorded data, and believe that we can reach consensus. On top of that, the data received from satellites, hot air balloons and surface measurements need correction, and no one are capable of knowing how to. 

You say we can’t even trust NOAA’s recordings, so in the end how can anyone claim the globe is warming, and further that it is human caused. My assertion is that if NOAA’s data did reveal global warming, you would embrace the data as your first borne son.
Cyrus Hall
2010-01-30 14:33:31
I didn't claim history was false, indeed, I said exactly the opposite.  I didn't say NOAA's reading were wrong; I said that raw data may not be a good guide.  I do assume that 3,146 working scientists are in a better place to make judgments about global warming than 31,000 random people with some college level science training, yes.  Your first link is a straw man, and also contradicts your earlier statements suggesting there is no warming (the link assumes that most skeptics believe there is, it's just too small to matter).  

As for your "peer reviewed" papers, I'd suggest you go read up on many of the journals those papers are published in.  But yes!  There are a large number of articles in the scientific literature proposing different primary causes of warming.  Most have not born fruit, but there is a healthy discussion taking place.  None, that I am aware of, discount CO2 and CH4 as a greenhouse gas.  Also, note how many of the papers published in respectable journals, and listed on that page, accept that warming is happening.  Most that posit a different primary reason for warming do not discount anthropogenic warming, they just to think it's not the largest factor.   Indeed, there is no consistent view on skeptics side, as no theory has withstood criticism yet.

I'm tired of responding to straw men, so I think I'm done here.  Do you also question medical science when ever you need it to save your life?  Why not?  It has ever financial incentive to lie to you.  What about evolution?  Is that a big lie, brought to you by scientists who want nothing more than to advance their carrier without consideration of ethics?  Or heck, computer science, where I can tell you there is plenty of research bought and paid for by large corporations who care less about the truth than their profits?  Why do you credit the scientific process in all of these, but not climate science?  You seem to believe that truth must always be self-evident.
maZZoo
2010-01-30 17:10:13
Great article, it's nice to see open science online :)
Kim Andersen
2010-01-30 20:14:13
@Cyrus Hall: When a person draws the “straw man” vs “educated” card, I know I'm talking to one who got paralyzed by the faith in school books, and forgot to think for himself. If the "straw man" claims something that's not in the school books, then don’t worry but follow the learned procedure: flash him the “education” card. You get all but my respect with that one. If the world followed your prejudiced approach, and no one dared oppose “science”, we would all still believe that life comes from nothing, as when mites “emerge” from dead meat. That was science then - but not anymore. 

How many doctors fail to make the right diagnose? You can’t trust your life to anyone - but you can ask and reason for yourself, and hope caries you the rest of the way. Evolution...Who can believe that? How many scientists have seen the truth in the eyes, and acknowledged that it all is built on pure fantasy and pseudo scientific fabrication? Assumptions are all that Evolution is based on. Just as is human caused global warming. 

You can’t prove the case for neither evolution nor human caused global warming. And as for today, you can’t even prove global warming. You’re a believer Cyrus, and I admire your ability to believe unproven “science”. You just picked the wrong source to believe...

Thanks for the conversation.

@Lau: have you asked "science" permission to add up the NOAA records? Are you licensed through proper education to ad up the numbers. Are Clojure proven at all to be trusted in adding numbers given in Celsius? Open science is heretic and falls short of coverage in the school books! How dare you?

http://www.foxnews.com/scitech/2010/01/28/save-rainforest-climate-change-scandal-chopped-facts/
Cyrus Hall
2010-01-31 13:14:16
A straw man argument is unrelated to school books.  A straw man argument is one in which the proponent makes a weak proposition in order to provide a false argument they can easily knock down.  Eg, that link you posted: The questionnaire did not correctly define the word "serious."  The writer then concludes this was a mistake, so therefore the entire thing is a fraud.  This starts the argument with a straw man, and then concludes with a standard logical fallacy.

I don't know what it means to "oppose" science.  I seem to remember that those who overturned the concept of spontaneous generation were scientists.  They did not "oppose" science, they actively participated in it.  Science is not about a title, it's about a set of actions. 

But thank you for clarifying that you reject various pieces of  modern science.  I think that nicely settles the debate.  You're right too, I am a believer, in that the slow and often flawed actions of man will slowly unveil the realities of the Universe.  I think history supports that belief pretty well.

Thanks for the conversation.
Ishaaq
2010-02-01 06:57:51
Firstly, let me congratulate you on a great post on data manipulating using Clojure, it made for interesting reading.

However, I think you jumped to a conclusion too quickly. Let me point you back to that "Hockey Stick" graph you had at the beginning, note the description on the Y axis: "Departures in temparature .... to 1990 average", note that the range is also small between -2 and +1. In other words, that graph is more concerned with a derivative - i.e. what you are seeing is a difference from the average temperature for the year 1990, not the absolute values of the temperatures themselves.

Obviously graphing a derivative is a bit more involved than graphing the raw numbers so you may not be inclined on spending more time on this, however, if you do, I think you'll find a dramatic difference. The problem with the raw numbers is that there is a lot of noise and large variations in individual years that hide the trend shown by the hockey stick. In fact climate scientists are not worried about yearly differences, it is the long-term trend that matters and that long term trend difference is less than 5 degrees - a fine detail your graphs are too noisy to prove (or disprove for that matter).

I also think that comment on carbon dioxide at the end was a bit naive and a cheap shot, but I'll give you the benefit of the doubt here, you probably hadn't really thought it through. Hint: the worry is not about the absolute amount of CO2 in the air, but more about its rate of change - again a derivative.
Ishaaq
2010-02-01 07:16:00
PS - by the way, my previous comment was not really meant to say that you were wrong in your conclusion, I just think you can't get to the conclusion you got to using the arguments you used.

It may well be that your conclusion is right, though I doubt it. However, the beauty of Science is that it is free to change its mind in the light of new evidence, so to say I am a "climate believer" is a bit of a simplification. More precisely: I currently believe we have a problem, I am willing (frankly I would be overjoyed) to be proved wrong with hard evidence.

To change tack a bit though: what is the most common remedy for climate change (assuming it was true)?  - Reduce global consumption levels. Now, lets assume for the moment that Climate Change is Bad Science. Is reducing global consumption still a worthwhile exercise? I believe yes, simply because reducing consumption also solves a number of other things: water scarcity, food scarcity, land scarcity, wildlife preservation etc. These are problems well worth solving and far less controversial (its pretty obvious that these problems exist) - and they have the same solution as global warming.
Lau
2010-02-01 09:53:01
Hey Ishaaq,

Your point regarding the different nature of the graphs is valid of course, but the point I was trying to make was that no matter however traditional science concludes that the last century was almost exponentially heating compared to the preceding millennium, then at the very least we should see some increase through that century and certainly not a decrease. I don't know enough about Ice Drilling or Tree Rings to run that data, but it wouldn't change anything as long as the temperature is in decline.

Secondly your point regarding lowering consumption is noble and indeed I agree that the typical consumer mindset is extremely unhealthy for all involved, yet thats not the goal of the Climate Advocates - They want to regulate behavior uses taxes and quotas, both of which I oppose.
Allan Kiik
2010-02-03 17:32:52
Ishaaq's point is not valid, because time derivative (rate of change) is not the same thing as anomaly and this is what we obtain when we calculate some average and subtract that from data. So, if there is some man-made catastrophic global warming signal in the raw data, we should see this on raw data too. But there's none, especially if you look at rural sites with long history.
Allan Kiik
2010-02-03 17:40:47
Btw, the graph is not invented by Al Gore but by Michael Mann from Penn State University. Al Gore just used it in his horror movie and for unknown reason called it "Dr. Thompsons thermometer".  Lonnie Thompson is real person but he has nothing to do with this graph.
David Stockwell
2010-02-04 12:14:05
Hi Lau, Great to see computational efficiency figuring highly in these analyses.  Putting the result in perspective, there are a number of ways of getting to a global average when the data has as many problems as these, stops, starts, station shifts, etc.  The one normally used in climate science has lots of adjustments, some justified and many others not. Its not clear this method, yours of using long records, or any others are any better or more reliable.

But if global warming has any truth, it should be relatively robust to these different approaches.  People like you and others are starting to look at the data and finding that warming is not a robust feature of the surface temperature record.  

Its not that any one approach disproves global warming, but that among many approaches, the one that happens to be used by climate scientists shows a lot of warming, but many others don't.
Bart
2010-02-04 23:47:27
I have a rather simple question I have never heard mentioned in the climate discussion:  64 million years ago our climate changed due to a collision.  No dinosaurs survived that, but we do know the climate changed.  I am not sure but, was Earth's orbit changed at this time?  If so, a history of global temp before and after would have little value.  Now, given this:
the 2004  Boxing Day earthquake/tsunami ALSO slightly changed our orbit.    It would seem to me, and now I am not a scientist but it woudl seem to me, that even a minute change in the orbit around our sun would have a PROFOUND affect on temperature.  Should we not collect data and revisit this issue in say:  1000 years?
Dixon Craig
2010-02-05 16:18:26
Thank you for composing and posting this webpage.

I have often wondered what a simple yearly average of the NOAA public 'raw' data would show, but 15 gigs of text files was too much for my pencil and calculator!

I only know a little script and C++, but I could follow your code. You have shown an elegant solution written in very clear English. I congratulate you.

I believe you have proven the Null Hypothesis: "A simple Average of annual temperatures from NOAA from 1929 to 2010 does NOT show a warming trend".

As many have posted, this is not the same as ending the Hockey Game (though I enjoyed the humor of your analogy), but I think the debate of what your proof implies should be left to blogs that specialize in debating climate science statistical methods. 

I hope others restrict their comments to just the Null Hypothesis and/or their love of "heavy computational parallelized Clojure" and streaming tarballs.
Luc VC
2010-02-07 16:02:37
Maybe I am inexperienced in this. What you did looks like the most basic of all things. Take as much measurement input data available and calculate an average. My schock is that this reveals a cooling. I would expect you would end up way too hot. Because of all the talk about heat islands around cities and how climate scientist compensate for this. Are you aware of which  compensations climate scientists do apply?
Lau
2010-02-09 20:54:44
@Luc: No I didn't modify the readings in any way
Phil M.
2010-02-10 22:37:24
Excellent analysis Lau and love the coding, I'll try to be more efficient with my own code (hanging my head in shame).

I think you have exposed the truth quite nicely. You can create a hockey stick from this data but only by choosing a non-representative base line, reducing the scale and only looking at the anomaly + more than a few manual adjustments. Otherwise, if you look at the true data, over the period of your data, the world is cooling.

End of.
M. Robertson
2010-02-11 06:20:02
Excellent!

I applaud you for your open and transparent code used in your presentation as well as your selection of data and the way you presented it.

This is just what I have wanted to see, a presentation of raw data from a reliable source (NASA) that has not been tampered with and presented in an accepted format. While I have been long removed from the mathematics/statistics in college, I feel that your endeavor was a noble one to clarify the meaning of some of the raw temperature data. 

You have obviously rattled some cages, as it seems that some heavy hitters are trying to debunk your work. Must have them worried. Perhaps you could offer a link to Steve McIntire w/ Climate Audit. Others viewers there would like to see this.

There is hope for change! Thanks again.
Geoff
2010-02-19 21:59:49
Always nice to see people engaging with the data.

I'd suggest the following improvements for your methodology:
 1. Make time-of-day corrections for the temperature record from individual stations, these have been changed over the decades and diurnal temperature variation is more than a little significant of course.  Similarly for the re-siting of stations, changes of measuring equipment etc.
 2. You need to account for the non-uniform coverage of the Earth's surface by the station data you have.  At least make a station's temperature represent all the surface area which is closer to it than any other; if you want to do it properly then you'll have to correlate stations that have incomplete coverage with the more complete record of other nearby stations in order to help fill in gaps you might find.  Look at geographies too in order to make a better interpolation of a station's data over the unsampled areas of the Earth.
 3. Let's have some some t-tests for significance of the slopes of global averages.

If all this sounds tricky it's because it is.

Your conclusion "Its clear from the official data that the globe isn’t wildly heating up, in fact in some places it is now cooling down." thus is unsupported by your work at this time, sorry.
Fred Beloit
2010-02-20 17:32:51
Lau, congratulations on this enlightening work . The patience and reason you have shown to those who expressed objections is a fine example to the rest of us who discuss issues on blogs. Thank you for your efforts.
Luc VC
2010-02-20 18:03:21
Geoff interesting remarks. But his claim is that he does not see a hockey stick on the last 80 year temperature readings of the stations existing since 1929. The risk of the compensations is that opinion creeps in.  Under which conditions would  your improvements of the methodology result in a hockey stick?
Fred
2010-02-20 22:46:55
Luc VC,

The last 80 years of temperature readings is the blade of the hockey stick.  The handle portion is the last 1000 or 2000 years of temperature readings, from proxy data, such as tree ring widths.  Those data for the hockey stick handle show very little variation in mean temperature deviations, between about -0.1 C and -0.4 C with one excursion to -0.6 C, by the picture Lau posts above.

In contrast, during the last 100 years, temperature deviation increased from -0.4 to +1.6 C.  This is the hockey stick blade.  Lau's first crack at the data shows mean actual temperature increasing from about 9.5 C to 12.0 C between 1920 and 2020.  This is actually a larger increase than in the hockey stick figure.

Thus Lau's analysis, crude though it may be using the raw data  unfiltered and ungridded, supports the hockey stick.
Geoff
2010-02-20 23:27:06
@Luc VC, #49:

Well, firstly the 'hockey stick' graph only evokes that image because it covers 1,000 years of history, not the 150 years we've had instrumental records over significant parts of the Earth's surface for.  Chop off the first 929 years and you won't be looking at a 'hockey stick' shape any more.  So in terms of making corrections to instrument records, no there most likely won't be a hockey stick in that data (unless you have a current warming trend and tack on the paleoclimatology record afterwards).

Examples that would potentially mask a warming trend include a trend among stations in taking daily measurements at e.g. local noon instead of 3pm (which will result in lower mean readings - a period of measuring at both times will be able to establish what the correction should be at a given site); or if the stations that do show a warming trend are representative of more of the Earth's surface than those that show a cooling trend in their histories; or if they're relocated further away from heat islands to places that might be naturally cooler anyway (due to geographical features etc).

You also have to reconcile any trend in the last few decades from surface station data with satellite measurements (available for the last 30 years or so of the 81 since 1929) before reaching any solid conclusions about the true state of affairs.