In general UNIX shell scripting is a very powerful and efficient way to sort and rewrite various data files. Its is much under-utlized.

Shell scripting is a command language that runs in the operation "shell" interface to a UNIX system (like ACISS). In this sense, UNIX and LINUX are the same.

The following are some basic manuals on shell scripting for different shells. One of us prefers C-shell (csh) and one of us likely prefers the Bash shell.



Example:

Suppose I have this raw data file and its called master1.txt. What follows is a set of commands that work in csh (but can be generalized to other shells).



In this data set, the only things I care about are labelled Hurricanes. How can I extract just that information?



And now my file contains only the relevant entries and its called new.txt



Next I might have a problem with the data field NOT NAMED due to that annoying space that is not contained with the actual named storms. I want to get rid of that because that space actually means there is an extra *field* for those lines compared to other lines.

SED is very powerfull command line editor



Now yields a new file (called new1.txt)



Nexe I am only interested in extracting fields 1 6 and 7 from this file. There are many ways to do this. One could use the "cut" shell command. So I could try this:



But, as it says, I always screw cut up (its easy to make mistakes).

So let's use the least elegant appraoch that involves AWK (at this point all real computer scientists now puke).



To produce the new file called new2.txt



Finally I want to sort on some column, in this case column 2 such that for each value of column 1 the lowest value of column 2 is reported.



Which gives me what I want for a particular exercise (in this case, I am using a portion of the whole real data). As you can see the data is sorted on column 2 (central pressure) and no longer on Storm ID (column 1). This is what I want for one particular aspect of the data analysis.