Using web services

Web services are internet enabled pieces of code that serve up requested data over the well established HTTP protocol that is also used by web browsers. When you access any website through a browser, a network connection is made and data is transferred using the HTTP protocol which allows for basic operations on data including GET, POST, DELETE, etc. In this context we'll think of web services as a data retrieval method and therefore GET is the only operations we'll need to concern ourselves with.

We will access a web service that provides historical climate data which is described at http://www.rcc-acis.org/docs_webservices.html

The website provides documentation on several operations that can be accessed, we'll use two of them in turn to retrieve the desired data and prepare it for further processing.

Weather Station Data

The first webservice we access will allow us to find relevant weather stations which we will query for historical weather records. This data is served up in a CSV (comma separated value) format. The web service URL is http://data.rcc-acis.org/StnMeta. A GET request with the appropriate query parameters (extra information given in the URL in the form of

http://server/path?firstParam=foo&secondParam=bar

In order to find the desired weather station we need a FIPS code determined by the census bureau. A file of these codes is located here. Based on the code, we can query the web service to find weather stations. One nice thing about these GET web requests and the fact that they are using the same protocol as a web browser is that you can type the URL for a GET request directly into the browser address bar and see what the resulting response looks like. Try pasting 'http://data.rcc-acis.org/StnMeta?county=41039' into your browser. Then we can use the same URL to retrieve the data in our program for futher processing.

In [34]:
# First I need to import some packages
import urllib2
import json
import matplotlib.pyplot as plt
import re
%matplotlib inline
In [10]:
resp = urllib2.urlopen("http://data.rcc-acis.org/StnMeta?county=41039").read()

Now we have a response which is in JSON format, which we will use another library to parse. The structure of the JSON response document is to have a single field named "meta" which is an array of station data. We can retrieve the first station to examine its structure.

In [11]:
stationJson = json.loads(resp)["meta"]
stationJson[0]
Out[11]:
{u'elev': 2503.0,
 u'll': [-123.61667, 43.26667],
 u'name': u'BAUGHMAN LOOKOUT',
 u'sids': [u'350547 2'],
 u'state': u'OR',
 u'uid': 15077}

So we can see that each station has a data about the name, latitude and longitude, elevation, and a station id which we will use for retrieving Eugene's historical weather data. There are a few in Eugene, we'll just take the first match for the one at the air field.

In [22]:
eugStation = [s for s in stationJson if re.match("(?i)Eugene MAHLON", s["name"]) != None][0]
eugStation
Out[22]:
{u'elev': 353.0,
 u'll': [-123.22056, 44.12778],
 u'name': u'EUGENE MAHLON SWEET AP',
 u'sids': [u'24221 1',
  u'352709 2',
  u'EUG 3',
  u'72693 4',
  u'KEUG 5',
  u'USW00024221 6',
  u'EUG 7'],
 u'state': u'OR',
 u'uid': 15218}

Note that I'm using a function called re.match which uses a regular expression to determine whether the string 'Eugene Mahlon' is within the station name. Regular expressions are a very useful construct that have been implemented in many languages, it is worth your time to learn how to use them.

I'm also using a python construct called a list comprehension to filter the list of stations and take only those that match the right name. List comprehensions are an alternative to writing a for loop and are often a more consise way of expressing some transformation of a list of data. This is such a common action in computation that it warrants taking some time to learn how to use them.

In [25]:
sid = eugStation["sids"][0]
sid
Out[25]:
u'24221 1'

Weather Data

Now we want weather data for this station, the web service is at http://data.rcc-acis.org/StnData, and we will access it using the sid from the Eugene station. We also need to specify the start date with 'sdate', end date with 'edate', and the elements (data measurements) to return. There are many possible measurements documented, in this case we can retrieve the min and max daily temperatures.

The weather data is reported in the 'data' field of the response JSON. Each row in the resulting array will have the date, the min temp and the max temp, in that order.

Note that in the string for the URL the space in the SID needs to be replaced with '+' character, otherwise it's not a valid URL.

In [38]:
url = ("http://data.rcc-acis.org/StnData?sid=%s&sdate=2010-01-01&edate=2016-12-31&elems=mint,maxt" % sid).replace(" ", "+")
resp = urllib2.urlopen(url).read()
weatherData = json.loads(resp)["data"]
weatherData[1:5]
Out[38]:
[[u'2010-01-02', u'40', u'51'],
 [u'2010-01-03', u'39', u'49'],
 [u'2010-01-04', u'39', u'56'],
 [u'2010-01-05', u'50', u'57']]

Now we'll produce a simple plot of the data over the last 7 years.

When we parsed the data from the weather service, the temperatures were passed to us as strings, so we need to ask python to parse those into floating point numbers for plotting.

In [35]:
minT = [float(w[1]) for w in weatherData]
maxT = [float(w[2]) for w in weatherData]
plt.plot(minT)
plt.plot(maxT)
plt.show()