This assignment will be an exploration through the Global Terrorism Data base with the idea of looking for patterns and correlations and attempting to make future predictions based on past behavior. I will try to incoroporate an element of Machine Learning, but this might be difficult or stupid - we will see.
I will be putting this together in various pieces so refresh this page often to pick up the lastest step when it becomes available.
Steps:
Note: Exclude the 9/11 event from your analysis.
- Begin by downloading the edited global terrorism database (GTD) here I have edited a lot out of
the original gigantic number of columns data base which is
available HERE . To represent different parts of the world, simply use approximate Lat and Long rectangles.
- It does seem like this would be a good Gloo Viz exercise but alas we are not there. So we will build on some of what you have alread done to look for patterns and correlations. Produce two timeline plots, a) event occurences in North America and b) for occurences in the rest of the World. Feel free to bin the data anyway that you want
- Determine the number of events per year in North America compared to the Middle East. Identify and report on any years which have an event rate that is less than 1% of the expected Poisson probability
- Generate a Lat, Lon plot of all terrorist events (over the entire time line) and use K-means or some other technique to try to define clusters on a radius of 100-300 km scale. (the best approach is likely to move circles of this radius, randomly around and count events inside circles)
- Identify the 3 highest local event density circles over the entire time record and then examine the time behavior of event density within these 3 regions. See if you can fit a time function to that event density. From that function,
predict the number of fatalities in the 2020-2025 period in these locations if the average level of terrorism increases by a factor of 3 relative to the 2005-2015 period Note, this is kind of an ML problem: your function fit should minimize the residuals between your function and the data and any function is allowed (even a polynomial in this case because terrorist attacks are NOT a part of nature) from which you can predict the
future.
- Determine by any means (even visually) if there is any correlation between event density and amplitude density (amplitude = combined number killed + wounded
- Are targets randomly chosen? You will want to use a weighted random number generator for this. By that I mean the following. Over the time period there have been X number of attacks against a certain kind of targe (say a business). Suppose there have been 1000 attacks against businesses over the period of record but only 50 attacks against airports. Therefore a business attack is 20 times more likely presumably simply because there are many more businesses than airports. A weighted random number generator just means that you assign 20 times more random numbers to business than to airport.
Construct a histogram of randomly generated targets over the time line. Use that as your expectation value. Now compare that with the actual data and perform a χ2 test to see if the data is consistent with targets being random
- Determine by any means (even visually) if there is a relation between target type and regional location in the world. Is there a region of the world where private citizens have the highest probably of being targeted?
|