Reading headers in matrix data loading

Hi,

I am having trouble reading header info from some Atomic Force Microscopy images. The data are opened in "Gwyddion" where I convert the images to txt files including some header information which relate to the x and y scales of the image. I have written a procedure for opening and presenting these image files but this has required manually entering the x/y scale for the SetScale operation. I would like to be able to use the header information during the data load to set the scale automatically. The data is of the form:

# Channel: Height
# Width: 2.000 µm
# Height: 2.000 µm
# Value units: m
1.02e-007 1.04e-007 1.02e-007..... matrix data

I obviously then need to be able to read out the header info for height and width. I have looked at the FReadLine operation and the Load File Demo experiment but can't figure out how to read only those points (e.g. Line 2, col 10 - 14 & line 3, col 10 - 14). Would it also be possible to get the units this way? i.e. here it says 2.000 µm but some images are on the nanometer scale so would need to be scaled accordingly.

Many thanks.
You could read in the entire line with FReadLine and divide the string using the character positions that you specify:

...
FReadLine  refNum, sTemp
sWidth = sTemp[8,13]   //pick out numeric portion, ie, 2.000
sWidthUnit = sTemp[15,16]  //pick out units, ie, um
...


This will only work if the character positions are fixed and don't vary from file to file.

If their positions in the string do vary, then try using strsearch to find the ":" as a marker. Pull out the last two characters as the units string and the remainder is the numeric portion.
Thomas.Dane wrote:

I obviously then need to be able to read out the header info for height and width. I have looked at the FReadLine operation and the Load File Demo experiment but can't figure out how to read only those points (e.g. Line 2, col 10 - 14 & line 3, col 10 - 14). Would it also be possible to get the units this way? i.e. here it says 2.000 µm but some images are on the nanometer scale so would need to be scaled accordingly.


To only read the second line just call FReadLine twice, throwing away the first result.

Also, for that kind of parsing my first choice is usually sscanf. Alternatively you could consider SplitString but I prefer to go with sscanf if I can.
Here is a solution. But first, . . .

If you are not familiar with Igor symbolic paths, execute this:
DisplayHelpTopic "Symbolic Paths"


Also, the units in the example you sent use a micro character. Micro is coded differently on Windows and Macintosh so, after pasting the code below into a procedure window, make sure that the micro character (see comment in two places below) is in fact a micro character.

This solution is tailored to the text you posted exactly. If the actual file is any different you will need to adjust.

Function ReadHeaderInfo(pathName, filePath, width, height)
    String pathName     // Name of symbolic path. Ignored if filePath is full path.
    String filePath         // File name, relative path or full path
    Variable& width         // Output: Width in meters
    Variable& height        // Output: Height in meters
   
    Variable refNum
    Open/R/P=$pathName refNum as filePath
    if (refNum == 0)            // File not opened - bad pathName or fileName
        return -1           // Failure
    endif
   
    Variable gotWidth=0, gotHeight=0
    Variable number
    String units
    Variable lineNumber = 0
    String line
    do
        FReadLine refNum, line
        if (strlen(line) == 0)
            Print "File ended without finding width and height"
            return -1   // Failure
        endif
       
        String temp = line[0,7]
        if (CmpStr(temp,"# Width:") == 0)
            sscanf line, "# Width: %g %s", number, units
            if (V_flag != 2)
                // Did not get two items
                Print "ReadHeaderInfo: sscanf failed for width"        
                return -1   // Failure
            else
                width = number
                strswitch(units)
                    case "µm":     // Micro
                        width *= 1E-6
                        break
                    case "nm":
                        width *= 1E-9
                        break
                    default:
                        Print "ReadHeaderInfo: Unknown units for width"
                        return -1   // Failure
                        break
                endswitch
                gotWidth = 1
            endif
        endif
       
        temp = line[0,8]
        if (CmpStr(temp,"# Height:") == 0)
            sscanf line, "# Height: %g %s", number, units
            if (V_flag != 2)
                // Did not get two items
                Print "ReadHeaderInfo: sscanf failed for height"           
                return -1   // Failure
            else
                height = number
                strswitch(units)
                    case "µm":         // Micro
                        height *= 1E-6
                        break
                    case "nm":
                        height *= 1E-9
                        break
                    default:
                        Print "ReadHeaderInfo: Unknown units for width"
                        return -1   // Failure
                endswitch
                gotHeight = 1
            endif
        endif
   
        if (gotWidth && gotHeight)
            break
        endif
       
        lineNumber += 1
        if (lineNumber > 100)
            Print "Searched 100 lines without finding the header width and height information."
            return -1
        endif
    while(1)
   
    return 0                // Success
End

Function Demo(pathName, filePath)
    String pathName     // Name of symbolic path. Ignored if filePath is full path.
    String filePath         // File name, relative path or full path
   
    Variable width, height
    Variable result
    result = ReadHeaderInfo("home", "Notebook0.txt", width, height)
    if (result == 0)
        Printf "Width=%g, height=%g\r", width, height
    else
        Print "Error in ReadHeaderInfo"
    endif
End

Thank you so much for that code, it works perfectly. All except one small issue, it originally couldn't read the file if the units were in µm. I figured it out by printing the string read from FReadLine to see what was going on and it turned out that Igor reads 20.00 µm as 20.00 µm despite the fact that there is no extra character in the text file and copying the text into an igor window gives 20.00 µm. So I just changed all the switches relating to units to include µm and it works fine, I don't know if this is a known issue? Managed to then incorporate this header reading into loading a text image and set the x and y scale using this info. Thanks also to the others for useful pointers.
Thomas.Dane wrote:
I figured it out by printing the string read from FReadLine to see what was going on and it turned out that Igor reads 20.00 µm as 20.00 µm


I believe this is because your data file uses UTF-8 text encoding. This is an encoding format of Unicode in which all of the original 128 ASCII characters are represented by themselves and all of the other Unicode characters are represented by 2-, 3- or 4-byte sequences.

Igor Pro 6 does not understand UTF-8 so the UTF-8 mu character appears in Igor (on Windows) as "µ".

Your solution to this issue is the correct one. But you might want to leave in the original case statement, giving two consecutive case statements, so that it will work if you happen to have a file that uses Windows Latin1 text encoding.

Thomas.Dane wrote:
despite the fact that there is no extra character in the text file and copying the text into an igor window gives 20.00 µm


I suspect there is an extra byte in the text file. You would see it if you dumped it in hex. You should also see it if you open it as an Igor notebook. You won't see it if you open it in a UTF-8-aware program.

When you copy it to the clipboard, the system creates a "system text encoding" version of the text in which the micro is represented in Windows Latin1 text encoding as a single byte.