Loading DPC files in Igor

Dear Igorians,

I have been trying recently to load a DPC file (Dynamic photon counting file) on Igor. However I have been having some difficulty with this.

The DPC file format according to the manual of the acquisition program is something like the following:

Header:
Bytes Content
0-1 Characters IM
2-3 Comment length in bytes ( ComLen)
4-5 Width of the image in pixels (iDX)
6-7 Height of the image in lines (iDY)
8-9 X-Offset (iX)
10-11 Y-Offset (iY)
12-13 File type: 2=16 Bit
14-64 Reserved
64-nnn Comment area can contain any information. It is used by the program to store the status string and the Calibration tables (if any)
nnn+1-End Data Area (Starts at address 64+ ComLen)

The Data Area looks like n times the following diagram, where there is one set of such data
for every frame. The first entry is the time stamp relative to the origin in ms, followed by a
the coordinates of all photons counted within this frame. The end of the data for the first
frame is indicated by a delimiter (0xFFFFFFFF). This is repeated for all frames recorded.

Byte Coordinates Remark
00-03 Timestamp Time in ms from first image (long int)
04-05 X0 Photon 0, X-Coord.
06-07 Y0 Photon 0, Y-Coord.
08-09 X1 Photon 1, X-Coord.
10-11 Y1 Photon 1, Y-Coord.
Etc. Etc. Etc.
Etc. Etc. Etc.
20-24 0xFFFFFFFF Delimiter (Long int, all bits set to 1)

I have been trying to load one of these files in with the in built Igor functions but it has not been possible.

Any advice on which load procedure(or command) would better suit this file format? I am really not interested in loading the header, I just want to load the information from the data area in a 2 column wave. Any help would be greatly appreciated.

Alexis

You need to use the Open operation to open the file, the FBinRead operation to read it, and the Close operation to close it.

You can read the header, excluding the comment, into and Igor structure. This will allow you to determine where the comment ends and where the data begins.

This is fairly difficult programming and will require at least intermediate Igor programming skills.

Here is an example that illustrates these techniques:
http://www.igorexchange.com/node/5328

I can't tell, from your description, what the layout of the data for a frame is. Specifically, I can't tell how to determine how many XY photon pairs there are (how to determine what Etc. Etc. Etc. means, precisely).

Also, you don't say if the data is big-endian or little-endian.

It would be good if you can find documentation for the file format and post a link to it.

Also, attach a sample file and include an explanation of what you want to get out of it, such as how many waves and what dimensionality.
hrodstein wrote:

Also, you don't say if the data is big-endian or little-endian.


My guess is that you read that from the first two bytes, which look similar to the TIFF standard (where the first two bytes contain either 'II' or 'MM').

Igor wrote:
My guess is that you read that from the first two bytes, which look similar to the TIFF standard (where the first two bytes contain either 'II' or 'MM').

And my guess is that you meant "either 'IM' or 'MI'".

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Many thanks to everyone for their replies especially Hdrod, your comments really helped. I actually managed to get this working shortly after I posted the question here.
The code I used (the part that does the actual reading of the photon variables anyway, the full reading function is attached) looks something like this:
for(ii=0;jj<numofframes;ii+=1)  // Initialize variables;continue test
                Fbinread/U/F=2 refnum, bytereadX
                Fbinread/U/F=2 refnum, bytereadY                            ////======read photons
                Fstatus refnum             
                    if(bytereadX==65535 && bytereadY==65535)     // if you read a delimeter value
                        Fsetpos refnum, V_filepos-4                     // reset the file position
                        if(jj!=numofframes-1)
                            Fbinread/F=5 refnum, delimeter      // read out the delimeter and move on  except if it is the last frame
                            fpoint[jj]=ii
                            jj+=1
                            ii-=1
                        else
                        print/D "End of frames reached breaking: Fileposition=",V_filepos,V_logEoF//,jj,ii
                            break
                        endif
                    else
                   
                    X0[ii]=bytereadX;Y0[ii]=bytereadY
                   
                    endif
                bytereadY=0;bytereadX=0
            endfor 

I am trying to speed this up though as the files are of the order of 10MB and even more sometimes and reading them into igor really takes a long time.
Is there some way to parallelize the reading from the file so that I can speed up the loading, or any other tips that may speed up the function? Typically 5X10e4< numofframes < 2X10e5 and ii goes up to 10e^6 and more.
Also here is a link to one of the files if anyone is interested: https://www.dropbox.com/s/34n5d3qtkftqvzo/TR4_GAIN63_I3d.dpt?dl=0
Cheers
AA
PholtonLoading.ipf
The .ipf file you posted makes several calls to functions that are not defined in the file (eg. LoadPhotonCountingDif, BinInSinglecolumn, etc.). It looks like those functions aren't required to actually load the data, so I've commented out the entire block containing these missing functions (starting with the "///test to see if file loaded ok and print relevant values " comment).

I first tried running your function with Igor 7 on Macintosh and it seemed like it was taking forever. In order to see what was being so slow, I did the following:
1. Added #include <FunctionProfiling> to the top of the procedure file and then recompiled.
2. Added a BeginFunctionProfiling() call to your code just before the Make/O/N=2 IM line
3. Added a EndFunctionProfiling() call to your code just after your Close refNum line.
4. Because my initial attempt to load the file seemed to never finish, I added the following code at the bottom of your for loop, just before the endfor:
if (ii > 200)
    break
endif

5. Added return 0 after the call to EndFunctionProfiling()
I then executed your function. It executed quickly (because of the early exit) and then the function profiling report was displayed. The report indicated that about 99% of the execution time was spent on the Fstatus refnum line within the for loop.

Igor 7 has a new operation FGetPos that can be used in place of Fstatus when you only need to have the V_filepos variable set. In this particular call, that seems to be all you're using. Replacing Fstatus refnum with FGetPos refnum makes your code much faster. After removing the changes I made above (keeping only the FGetPos instead of Fstatus change), I can now load the test file you made available in about 5 seconds.

I suspect that you could still improve the performance of your code even further but maybe this change will make it fast enough for your purposes.
aclight wrote:
The .ipf file you posted makes several calls to functions that are not defined in the file (eg. LoadPhotonCountingDif, BinInSinglecolumn, etc.). It looks like those functions aren't required to actually load the data, so I've commented out the entire block containing these missing functions (starting with the "///test to see if file loaded ok and print relevant values " comment).

I first tried running your function with Igor 7 on Macintosh and it seemed like it was taking forever. In order to see what was being so slow, I did the following:
1. Added #include <FunctionProfiling> to the top of the procedure file and then recompiled.
2. Added a BeginFunctionProfiling() call to your code just before the Make/O/N=2 IM line
3. Added a EndFunctionProfiling() call to your code just after your Close refNum line.
4. Because my initial attempt to load the file seemed to never finish, I added the following code at the bottom of your for loop, just before the endfor:
if (ii > 200)
    break
endif

5. Added return 0 after the call to EndFunctionProfiling()
I then executed your function. It executed quickly (because of the early exit) and then the function profiling report was displayed. The report indicated that about 99% of the execution time was spent on the Fstatus refnum line within the for loop.

Igor 7 has a new operation FGetPos that can be used in place of Fstatus when you only need to have the V_filepos variable set. In this particular call, that seems to be all you're using. Replacing Fstatus refnum with FGetPos refnum makes your code much faster. After removing the changes I made above (keeping only the FGetPos instead of Fstatus change), I can now load the test file you made available in about 5 seconds.

I suspect that you could still improve the performance of your code even further but maybe this change will make it fast enough for your purposes.


Many thanks for looking into this. Your suggestions did help a lot. Initially the file you said was taking forever was taking about 40s to load now it is down to 2-3s. Yeah I forgot to comment out a couple of the references to other functions, I realized just after uploading the file.
The profiling functions really helped as I wasn't aware of their existence thanks for pointing them out (they will come in handy on the analysis/corellation of the data which is also quite slow atm). Your first straightforward trick to replace the Fstatus with Fgetpos sped up things substantially. I also added some more multithreading references where I could and made some minor changes (reading both X and Y at the same time) and now at least it is workable for the larger files so a 10MB file takes about 30s which is more or less ok (there are a couple of 100MB files that would take about 5min to load but still much better than the previous hour or so that I had to wait).
With the latest alterations now 80% of the time is spent actually reading out from the file ie in the Fbinread command. However there is still a 10% of the time that is used to parse the read out values to the waves that store them in Igor. Is there a way to actually read out directly into some predefined position of a wave/array?
I tried something like
Fbinread/U/F=2 refnum, Y[ii] but Igor wouldn't have it. Is there any work around for this.

AA

Looking at the code you posted, I have to admit, I am a bit confused. You use a for loop that mixes both ii and jj. You test inside that for loop for a break condition. You have a variable fpoint that stores jj according to ii. Finally, your end loop resets variables that are then reset again by FBinRead at the top of the loop.

Are you mixing the commands to read one frame with those to advance frame-by-frame. If so, I wonder if this approach is compounding the overhead. What would happen if you make a loop just to read a single frame, code that in to a threadsafe function call, and then call it within a loop to read frame-by-frame?

As to the request to speed up the storage ... could this be done by reading one frame into a structure instead of a wave? Or can the system be tricked to read each frame as an XY image instead of reading in to X and Y waves?

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAH
jjweimer wrote:
Looking at the code you posted, I have to admit, I am a bit confused. You use a for loop that mixes both ii and jj. You test inside that for loop for a break condition. You have a variable fpoint that stores jj according to ii. Finally, your end loop resets variables that are then reset again by FBinRead at the top of the loop.

Are you mixing the commands to read one frame with those to advance frame-by-frame. If so, I wonder if this approach is compounding the overhead. What would happen if you make a loop just to read a single frame, code that in to a threadsafe function call, and then call it within a loop to read frame-by-frame?


The data that are read out, that I need for the analysis are the X & Y coordinates and and how many of them are in each shot/frame. So I use two waves for X and Y and one that stores progressively how many points were read before the frame ends. The catch 22, is that you don't know a-priori how many coordinates are in each frame, it varies from frame to frame (with a poissonian distribution) that's why I have to test every time if the frame/shot has ended so that I can then store the value of where this happened as well. So I think it wouldn't be possible to use threadsafe for this type of data readout?!
I tried reading in a [1,2] matrix the XY values instead of single X,Y variables but that didn't really speed up the process.

jjweimer wrote:
As to the request to speed up the storage ... could this be done by reading one frame into a structure instead of a wave? Or can the system be tricked to read each frame as an XY image instead of reading in to X and Y waves?

Since you don't know the number of the points in the frame before you read it out, I don't see how this could be done. There is a process to extract the individual frames into images. However that really doesn't help. The reason the data are stored like this is for compression reasons. You only store the locations of a 2d array where you record a count in each shot. typically this is pretty low so you save a huge amount of space. I once tried to extract the frames into images with a standard tiff compression and a ~1mb dpc file ended up in something like ~2GB of images.



I still have the impression that what you wrote might be more efficient when coded as a function just to read a frame that is subsequently imbedded in a function to loop through the entire set of frames. In this case, you might even consider to "preview" the file first, find the positions of each frame, and then ship out the frame-by-frame reading to a threadsafe function using the pre-stored starting points of each frame. So, rather than reading frames in a single, sequential thread, you read them "in parallel threads".

I also wonder if reading the X, Y data in to a string list and then dumping that to a wave after the frame is read would be faster? I think MatrixOP or something? IOW, I wonder if the overhead to sequence the data in to the wave on a point-by-point basis could be reduced by doing the read to a faster storage method and then reprocessing back to a wave _en mass_.

In any case, it sounds as though you have a workable answer. Everything now is just mucking about on the details. :-)

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAH
It seems to me that once you know the position at which the data area starts, you could read from that point to the end of the file all at once into a wave (created with Make/W/U). Then you'd go through that wave a point at a time to find each frame and store the photon coordinates in a separate wave that's structured like you need it to be. This approach would have the benefit that you'd only need to do a few file based operations and then from that point on you're working with all of the data in memory.

Since I was curious to see how much faster this would be, I implemented it (correctly even, I think).

Function LoadPhotonCounting2()
    String Setname ="LL"
    Prompt SETNAME, "Name of set"
    DoPrompt "Input name for Set", setname
    if(V_flag)
        print "Cancelled by user"
        return -1
    endif
    NewDataFolder/O/S root:$"PS_"+setname
    variable dpt=150000,numofFrames,refNum,pos,dptdif=0,ComLen,Nframes
    String pathName,str,fullPath,fileFilters = "Data Files (*.dpt,*.dpc):.dpt,.dpc;"
    String/G fileName,expdetails=""
    Variable/G photonN, timestamp
           
    Open/F=filefilters/R/Z=2 refNum //as fileName   // Store results from Open in a safe place.
    fullPath = S_fileName
    filename=S_filename
           
    if (V_flag == -1)
        Print "DemoOpen cancelled by user."
        return -1
    endif
    if (V_flag != 0)
        DoAlert 0, "Error in DemoOpen"
        return V_flag
    endif      
    Variable start = StopMSTimer(-2)
    Make/O/N=2 IM                   //Store 2 first Characters
    Make/O/N=31  header=0           //Make Wave to store header values
    Fbinread/U/F=1 refnum, IM                       //First 2 characters should be IM (73 77)
    Fbinread/F=2 refnum, header               //read the first 62 bytes of the header
    variable ii,jj,BytereadX,BytereadY,numofphotons,delimeter
    Make/O/T HContent
    Hcontent= {"comment length","width of image","height of image","x offset","y offset","file type 2=16bit","reserved"}    // explanation of the header values read for future reference
           
    Fstatus refnum
    /// read the first 11 lines (the comment section) into a string
    FReadLine refNum, str       // Read first line into string variable
    expdetails=str
    for(ii=0;ii<10;ii+=1)        //read following 10 lines and append to expdetails
        FReadLine refNum, str
        expdetails+=str
    endfor
               
    //now get  number of frames/exposures from the expdetails
    Nframes=str2num(expdetails[strsearch(expdetails,"NrExposure=",0,2)+11,strsearch(expdetails,",",strsearch(expdetails,"NrExposure=",0,2),2)-1])
    variable/G threshold=str2num(expdetails[strsearch(expdetails,"threshold=",0,2)+10,strsearch(expdetails,"[",strsearch(expdetails,"threshold=",0,2),2)-1])
    numofFrames=Nframes
    Variable/G frames=numofFrames
    Fsetpos refnum, 64+header[0]    // go to the start of the data
    Fstatus refnum
           
    // Determine the number of bytes from the start of the data to the end of the file
    Variable numDataBytes = V_logEOF-V_filepos
           
    // Each frame has one timestamp (32-bit int = 4 bytes) and one delimiter (32-bit int = 4 bytes) for a total of 8 bytes plus the photon coordinate data
    // If we assume that the NrExposure value in the comments is accurate, then we can calculate the expected number
    // of photons following this formula:
    // numPhotons = (numDataBytes - (# bytes for all non-photon frame data)) / 4
    // since each photon coordinate is 2 16-bit (2 byte) integers = 4 bytes
    Variable numPhotons = (numDataBytes - (8*numofFrames))/4   
    Variable intWavePoints =    numDataBytes / 4    // 4 bytes per integer
    Make/O/N=(intWavePoints)/I/U intWave   
    Make/O/N=(numPhotons) X0=0,Y0=0
    Make/O/N=(numofFrames)/I/U Fpoint
    Make/O/N=(numofFrames)/I/U timestamps
    // Read in all of the data into an unsigned 32-bit integer wave.
    Fbinread refnum, intWave
           
    // Now go through intWave and extract the timestamps and photon coordinates.
    Variable intCounter = 0
    Variable frameCounter = 0
    Variable needTimestamp = 1  // Whether we expect the next integer read from the wave to be a timestamp.
    Variable photonCounter = 0
    int thisInt, photonXPos, photonYPos
    For (intCounter = 0; intCounter < intWavePoints; intCounter += 1)
        if (intWave[intCounter] == 0xFFFFFFFF)  // This is a frame delimiter
            frameCounter += 1
            needTimestamp = 1
            continue
        endif
               
        if (needTimestamp)
            timestamps[frameCounter] = intWave[intCounter]
            needTimestamp = 0
            Fpoint[frameCounter] = intCounter
        else
            // Extract photon coordinates.
            thisInt = intWave[intCounter]
            // This logic assumes that the data read from the file is in
            // little Endian byte order. That assumption is made elsewhere,
            // so it seems appropriate to make it here as well.
            photonXPos = thisInt & 0xFFFF
            photonYPos = thisInt >> 16
            X0[photonCounter] = photonXPos
            Y0[photonCounter] = photonYPos
            photonCounter += 1
        endif
    EndFor
    Close refNum                //close file
           
    printf "File read took %g seconds.\r", (StopMSTimer(-2) - start) / 1e6
           
    fpoint[jj]=ii
    Duplicate/O fpoint PhPFr
    PhPFr[1,*]=Fpoint[p]-Fpoint[p-1]
           
    wavestats/Q PhPFr
    photonN=V_sum
    if(numpnts(X0)>photonN)
        redimension/N=(photonN) X0,Y0
    endif
    //Y0-=1
    wavestats/Q Y0
    variable y1=V_max-V_min+1
    wavestats/Q X0
    variable X1=V_max-V_min+1
    //      print x1,y1
    //=======================Make 2D image if more than 1 line/streak is read
    if(x1>1)
        Make/O/D/N=(X1,Y1)/D  XYZimageCountRate
        Make/O/D/N=(X1,Y1)/D  XYZimagecount
        Duplicate/O X0 ZZ
        wave Im2d=XYZimagecountRate
        wave Im2Dc=XYZimagecount
        IM2d=0;IM2Dc=0;zz=1   ////zz=1 1 photon for every set of coordinates
        ImageFromXYZ /AS {x0,y0,zz}, IM2D, IM2dC
        IM2d/=numofframes
    endif
    //=========================
    KillWaves/Z Fpoint,ZZ//,XYZimagecount
           
           
           
#if 0
    ///test to see if file loaded ok and print relevant values         
    if(IM[0]==73 && IM [1]==77 && jj==numofframes-1 && V_filepos==V_logEoF)  ////add other controls as well
        printf "File appears to have loaded fine\r  Number of frames read=%g // Photon Coordinates read=%g \r",numofFrames,ii
        if(ii!=PhotonN)
            printf "read %g more photons than anticipated will try to read the file again \r" -(ii-photonN)
            //      LoadPhotonCountingDif(-(ii-PhotonN),filename)
        else
            print/D "Average photons per frame=",ii/numofFrames
            print/D "Average photons per line (only valid for DPT files) =",(ii/numofFrames)/(X1)
        endif
    endif
           
    KillWaves/Z IM
    Y0-=1  //get rid of zero pixel that is never occupied and causes nan values
           
    string filetype=filename[strlen(filename)-3,strlen(filename)]
           
    if(cmpstr("dpt",filetype)==0 && X1>1)
        print "DPT filetype detected, proceeding to bin data in single line"
        //  BinInSinglecolumn() //do binning  debugging
        place0Framesatend()
    else
        Duplicate/O PhPFr PHline
        MakeYavg()
        place0Framesatend()
    endif
    // and place the null frames at the end (speeds up the G calculation algorithm by moving the null values at the end and breaking when the first zero is detected)
#endif
End


This version took about 0.7 seconds to load your test file versus about 2.6 seconds using your original function.

While playing around with this, I think I found an error in your code. You have this:
photonN=(V_logEOF-V_filepos-numofframes*5)/4 +1    ///// photons = (filebytes-headerbytes-bytes for end of frame X frames- file end bytes )  / (bytes per photon coord ) + 1


but I don't think that's right. That formula gives more photons than seem to actually be in the file. If I understand the file format correctly, I think my formula for number of photons is correct.

For what it's worth, the X0 and Y0 waves generated by your version and my version seem to be identical except that your version has (-5, -5) for quite a few rows at the end. I think this is because you're calculating more photons than are actually present in the file.

By the way, you'll need Igor Pro 7 to run my version of the function. Hopefully you're already using IP7.
aclight ,
I have tried the function you posted and I must say that it worked like a charm. I had to spend some time to figure out what the bitwise operations were actually doing as I haven't used any bitwise functions in the past. There was only a small issue with the values stored in fpoint that was giving always 2 counts more than the actual coordinates in the individual frames plus 9 more in the first frame. So I just replaced that part and then everything worked perfectly.
        Variable numPhotons = (numDataBytes - (8*Frames))/4
    Variable intWavePoints =    numDataBytes / 4    // 4 bytes per integer
    Make/O/N=(intWavePoints)/I/U intWave   
    Make/O/N=(numPhotons) X0=0,Y0=0
    //Make/O/N=(Frames)/I/U Fpoint
    Make/O/N=(Frames)/I/U timestamps
    Make/O/N=(Frames) PhPfr    //photons per frame
    // Read in all of the data into an unsigned 32-bit integer wave.
    Fbinread refnum, intWave
    Close refNum                //close file
    // Now go through intWave and extract the timestamps and photon coordinates.
        Variable intCounter = 0
    Variable frameCounter = 0
    Variable needTimestamp = 1  // Whether we expect the next integer read from the wave to be a timestamp.
    Variable photonCounter = 0
    variable framephotonCounter=0
    int thisInt, photonXPos, photonYPos
    For (intCounter = 0; intCounter < intWavePoints; intCounter += 1)
        if (intWave[intCounter] == 0xFFFFFFFF)  // This is a frame delimiter
            PhPfr[frameCounter]=framephotonCounter
            framephotonCounter=0
            frameCounter += 1
            needTimestamp = 1
            continue
        endif
 
        if (needTimestamp)
            timestamps[frameCounter] = intWave[intCounter]
            needTimestamp = 0
        else
            // Extract photon coordinates.
            thisInt = intWave[intCounter]
            // This logic assumes that the data read from the file is in
            // little Endian byte order. That assumption is made elsewhere,
            // so it seems appropriate to make it here as well.
            photonXPos = thisInt & 0xFFFF
            photonYPos = thisInt >> 16
            X0[photonCounter] = photonXPos
            Y0[photonCounter] = photonYPos
            photonCounter += 1
            framephotonCounter+=1
        endif
    EndFor

The speed that it loads the files is also unbelievably fast. With your code the biggest file that I have that is about 370MB took less than 1 min which is amazing and I doubt I will have to import files much bigger than that.
Regarding the photon number I had there, you are right it wasn't correct. From the file format listed above I was making out that the delimeter is 5 bytes but obviously that is not correct (?!) and the delimeter is 4 bytes. Also I had mistakenly assumed that there is a single time stamp in the beginning of the data which apparently isn't correct and there is a timestamp in every frame. Due to these mistakes my original photonN estimation was wrong so I had just set it to something I was sure was bigger than the actual number and just resized the coordinate array afterwards, which of course is not good coding and also computationally more expensive.
The -5 values were left over from some debugging initialization I had previously used.

jjweimer ,
I think that the multi-threading the way you suggest it would probably work but since aclight's version works now so well I think it might just be an overkill. The second method that you mention is similar I guess to the implementation of aclight and that seems to work very good. Maybe for files on the GB order the multi-threading would make sense, although as I said this is not on the horizon for the near future, although I know of other users (in other scientific groups) of the instrument that outputs these files and they are integrating over multiple days that would result in several tens of millions of frames since the frame acquisition rate is 100Hz.
Thank you both for your input.
I will upload the loading function when I have finished mucking up the details. :) and then what is left is to time optimize the analysis functions.