Looking for advice on writing an IGOR loader/plotter for very large files

Hi,

My employer recently purchased a new piece of equipment and I have been volunteered to write the IGOR routines to load/plot the data contained in the log files generated by the equipment. The machine takes a reading of ~60 different parameters every second. The length of files produced can vary, but in principle could extend anywhere from several hours of readings (~10,000 lines) up to two weeks, although I think a more typical range will be 1 or 2 days (~160,000 lines). I have a sample of the log file and each second has its own line in the file and each parameter is given a column. The columns are separated by a tab. I think it is possible to output different formats for the log file, but I am positive about this.

I would say I am an intermediate level IGOR programmer and have written a fair number of procedures for data loading/plotting/calculating on "reasonably" sized data sets. I have never worried much about processing time in my previous work, but with this project I think processing time will have a big impact on how useable my procedures end up being.

In a quick search around the forums I am thinking I may need to look into using SQL or structures, but have never delved into either of those subjects before and don't understand the pros of cons of them. Does anyone have any advice on what would be a good place to start for writing code to handle these log files? Generally, I am hoping to have something that can load in a given log file and plot selected parameters as a function of time.

Thanks,

Brandon
Hello Brandon,

At two-weeks we have 60*14*86400 data points which is roughly 72M points. This should not be a problem but you might want to look at optimizing your storage to the real dynamic range of the variables. A tab-delimited file is not as efficient as a binary file but it might have an advantage in case there is partial corruption of the data.

I like the idea of using SQL in such application. IGOR ships with the SQL XOP (Igor Pro folder:More Extensions:Utilities). Feel free to contact me directly if you want to discuss this.

A.G.
WaveMetrics, Inc.
I recommend that you write some procedures without regard to the size of the files. You can later decide if some other storage mechanism is worthwhile. I don't recommend going up the SQL learning curve before you have experience in dealing with your data, although it is very worthwhile for its own sake.

Once you have a basic plotter, I think you will be in a better position to decide if it is worthwhile to store the data in a database.

I also don't think you need to use structures, at least initially. In this case, structures would simply be a way to package multiple function parameters into one parameter. This may or may not turn out to be useful in your case.

If you intend to read data from the log while the equipment is writing to it, that is another problem. You will have to devise some way to keep track of what you have already read so you can read only new data.

I would start by writing a routine to load the file (LoadWave/G if it is all numeric, LoadWave/J if it contains strings or date/time values). You should load each file into its own data folder.

Next come up with a routine to plot columns specified by a string list parameter ("pressure;temp;current") over a range of times specified by two numeric parameters.

Next come up with a user interface that allows the user to choose the columns (using a listbox) and range of times through a control panel. For user entry of date/time values in a control panel, my date control snippet might be of use.

Igor wrote:

At two-weeks we have 60*14*86400 data points which is roughly 72M points.


Assuming that each point is a double that equates to 580Mb per log file.
hrodstein wrote:

Next come up with a user interface that allows the user to choose the columns (using a listbox) and range of times through a control panel. For user entry of date/time values in a control panel, my date control snippet might be of use.


hrodstein - Thanks for the snippet and the advice. I think I will get some basic plotting stuff written up and see where that takes me.

Igor wrote:

At two-weeks we have 60*14*86400 data points which is roughly 72M points. This should not be a problem but you might want to look at optimizing your storage to the real dynamic range of the variables.


A.G. - Thanks for letting me know that the number of points won't be a problem, I was a bit concerned with the large file size. I will do a bit more reading into SQL and may contact you if I feel this is something I will be needing.

Into IGOR I go!

-Brandon
andyfaff wrote:
Igor wrote:

At two-weeks we have 60*14*86400 data points which is roughly 72M points.


Assuming that each point is a double that equates to 580Mb per log file.


While large, it is still within the 32-bit application limits.

I don't know what data are collected in this case but many common measurements do not require 64 bits for representation so in this application it would make sense to store data in separate waves where each wave type is selected to fit the dynamic range of the corresponding parameter. The same argument would apply when storing the data in a database.

As far as I can see, the only difficulty here is introduced in managing and accessing more than one data set at a time. If that is a requirement of the OP then SQL is a good choice.

A.G.
WaveMetrics, Inc.