Upload your homework document here



This assignments has two parts.

This assignment will center on another common element of working with data that is distributed in space and time and with correlated and clustered events. In this case the data is occurence of moderate to strong earthquakes over the time period 1900--2008.

Part 1:


  1. Get the Data File

    This time I have preformatted the data and thrown away extraneous stuff.

      There are six columns of information:

      • Column 1 a technique descriptor which you do not need
      • Column 2 year of occurrence
      • Column 3,4 is lat/long
      • Column 5 is depth under the surface; 0 indicates no data
      • Column 6 richter scale magnitude of the quake


  2. Count in Grids

    • a) Break the data up in latitude and longtidue of 10 square degrees. Record the number of events in these grids and Submit a histogram of these counts (note that most boxes will likely contain zero counts.

    • b) Define an earthquake active grid as any box with 5 or more events over the time period. Determine the average event rate (number of earthquakes per year) for the global set. Now using Poisson statistics (see for a review ), Determine if the earthquakes in the Pacific Northwest region (defined anyway you want to) is anamolous or not in terms of Poisson probability. Also state whether or not you can really apply this kind of statistic to this data

    • c) For the Western US (and somewhat offshore) Do a nearest neighbor analysis of events as well as a K-means spatial test to demonstrate clustering in this region. Submit those results


  3. Write a clustering detection algorithm where clustering are events that occur at nearly the same latitude and longtitude and nearly the same time (you can decide what nearly means). Provide an output table of those clustered events and represent this clustering in some visual way. Note: this is hard, you will likely make the mistake of finding way too many cluster events. You need to think scientifically about what physically might define a clustered set of earthquakes. Aslo note that the data file only contains "strong earthquakes" so as to eliminate the flurry of aftershock events that can confuse any clustering approach.

  4. Using one any geomap database (I recommend the one the D3 Gallery plot the location (latitude and longtitude) of all events that occurred within the longtitude range -75 to -150.

  5. Plot a time line of all events of magnitude > 6.8 and attempt to identify if there is any time clustering of these events


Part 2:

  1. Get the Data File (This data is from New Mexico tree rings and starts in the year -136 until 1992. Annual precip is given in inches)

  2. Run your favorite FFT program on this time series Produce the mangnitude vs. frequency plot and comment on why this fails

  3. Now lets do the brute fore thing: Run a box sizes of 10, 20 and 40 years across the time series and count the total amount of positive and negative power in each box, where positive and negative refer to above or below the long term average of the data. Submit a histogram of those box counts and identify, from the box counts where there are rainy periods and where there are dry periods

  4. Now use any tactic that you want (hint - guassian kernel filtering might work and a) identify any possible 50 year periods of either drought or excess precipiation and b) attempt to demonstrate that even clustering exists at various places in this time series Produce all relevant plots to document your assertions .