Scientific Programming Stuff

Upload your assignment here:

This is the problem based on what we talked about in class on Friday a bit. There is a complicated wave form that is best fit by adding multiple functions together and blind guessing for the parameters to determine the best fit.

Note that this assignments has two parts so make sure you refresh the page to make sure you get both parts. Part 1 is ready now - part 2 will be ready in a bit.

Part 1: Minimizing χ²

The goal is to produce the best fit to the data via producing a solution that has the lowest value of chi squared. In this case, the functions would be linear + sine wave. Since it is unlikely that a constant period or phase for the sine wave will fit the entire data, you will likely have to make one or more a function of the time step, to produce a best fit. Again, this is why you program instead of using a black box programmed by someone else.

This exercise gives you practice in

coding up equations and adding them together
evaluting model vs data via the chi square techinque
practice with blind parameter searches to achieve the lowest value of chi square in an iterative manner.

Procedure:

Download the two column Excel File - column 1 is time step, column 2 is data value at that time step. There are 1002 time steps.

Smooth the data using any procedure that you want - you need to balance your smooth width with reduced noise vs loss of feature resolution. Your smoothed data set should look something like this:

Write a function fitter: The simplest one is a sine wave + linear equation and sum the two together. Obviously you will need more than one linear equation because of the slope change. Alternatively you might be able to fit a higher order curve to whatever you think the baseline is, but the only way to fit the harmonics is with a sin wave.

You will want to write code which will immediately pass through your parameter changes (line slopes, periods, amplitude, etc) and draw the function through the data so you can iterate manually towards some approximate solutioin

You will need to calculate the residuals (difference between the model and the data) at each time stap to determine χ² .

f_o = data; f_e = model values; don't worry about normalization for this statistic, the above form is good enough to determine a local minimum (best fit)

With your approximate fit now write a blind random search algorithm, as explained in class, using your now best guesses for parameters min and max, and then choose your random step size to add to the min parameter per iteration. You will want to output your χ² values per iteration so you can see if your converging or not.

It is probably a good idea of keeping track of all bad solutions (those which produce a higher value of χ² ) to help with the iteration. Note that you might want to divide χ² by the number of data points,

When you have a best fit, predict the value for x=1200 and its uncertainty and demonstrate how you made the uncertainty prediction in some reasonable statistical manner.

Materials to include in your write up:

A graph of your smoothed data
A graph of your data with your initial model plotted on it (step 4)
A graph of your data with your final model on it (step 6)
Your predicted value for timestep = 1200 and an estimate of the uncertainty of that value
The code that you wrote for the blind random parameter search.

Part 2: Numerical Integration using Cubic Spline and mpirun on ACISS

The reason a cubic spline is used is that integration under a cubic spline can be simplified (this will be discussed in class on Friday May 27). For now use this tutorial

The goal here is to integrate the area under the curve from the data from a cubic spline. Yes I know you can just feed the data to a function fitter for this, but obviously I don't want you to do that. As before write your own cubic spline fitter with blind search and χ² test and submit that code

Now run this code on ACISS using the mpirun facility:

module load mpi-tor/openmpi-1.7_gcc-4.8
module list (check that modules got loaded)
qsub -I (so your not running on the head node)

time mpirun -np 2 code_you_wrote

In this case np 2 means to allocate 2 processes. You should run with 2 processes and 12 processes and see if there is a difference. The time command will return a time that is relevant is the first parameter of the line. Compare the times for np = 2 and np = 12

The nature of the data does not easily lend itself to a good cubic spline fit so its possible to use many iterations in the blind search to find the lowest χ² - try using 10,000 iterations for np = 2 and np =12 and then try 1 million. Produce a table of times for those 4 situations as well as the value of your χ²

Previous Page