Checking for same sizes on images before loading?

I would like to confirm that the size of all image files in a folder is exactly the same *before I load the image files*. For example, I should throw a flag when I would select to open all image files in a folder with 10 images at 2MB each and one image at 1.5 MB. The intent is to prevent selecting a folder to load into a stack when the images in the folder are mis-matched in size.

Is there an efficient way to do this, even when it might involve breaking out to the shell or DOS level with ExecuteScript?

How about using Open and FStatus?  It was fast enough (<0.1 seconds) for ~80 files.

FUNCTION Get_File_Sizes(String strFolder_Path)
   
    //Get the path in case it's empty
    IF(strlen(strFolder_Path)==0)
   
        NewPath/Q/O/M="Select folder" pTemp_Path            //Gets the path to a FOLDER (not a file)
        PathInfo pTemp_path
           
        IF(strlen(S_Path)==0)
            Return -1
        ELSE
            strFolder_Path=S_Path
        ENDIF
       
        KillPath/Z pTemp_path  
    ENDIF
   
    Variable vStart=StartMSTimer
   
    //Get the files in the folder
    NewPath/O/Q/Z pFolder_Path, strFolder_Path
   
    String strFile_Names_List=IndexedFile(pFolder_Path, -1, "????")
    Variable vNum_Files=ItemsInList(strFile_Names_List)
   
    IF(vNum_Files>=2)
   
        Make/O/T/N=(vNum_Files) File_Names=StringFromList(p, strFile_Names_List)
        Sort/A File_Names, File_Names
       
        Make/O/L/U/N=(vNum_Files) File_Sizes=0
       
        //Get the file sizes
        Close/A
       
        Variable iFileDex, vRefNum
        FOR(iFileDex=0;iFileDex<vNum_Files;iFileDex+=1)
            Open/R/Z/P=pFolder_Path vRefNum as File_Names[iFileDex]
           
            IF(V_Flag==0)
                FStatus vRefNum
                File_Sizes[iFileDex]=V_logEOF
            ENDIF
           
            Close vRefNum
        ENDFOR
       
        Close/A
       
        Variable vStop=StopMSTimer(vStart)
        Print vStop/1e6
       
        //See if there are any files with a different size
        FindDuplicates/FREE/RN=Unique_File_Sizes File_Sizes
       
        IF(numpnts(Unique_File_Sizes)==1)
            Print "File sizes all match!"
       
        ELSE        //For each unique size, make a wave with that size and the name
            Variable iSizeDex
            FOR(iSizeDex=0;iSizeDex<numpnts(Unique_File_Sizes);iSizeDex+=1)
                Extract/O File_Sizes, $"Index_"+num2istr(iSizeDex)+"_Sizes", File_Sizes==Unique_File_Sizes[iSizeDex]        //Probably could just stuff the size into the wave note of the names wave
                Extract/O/T File_Names, $"Index_"+num2istr(iSizeDex)+"_Names", File_Sizes==Unique_File_Sizes[iSizeDex]
            ENDFOR
        ENDIF
       
    ELSE
        Print "There's only one file in the folder."
    ENDIF
   
    KillPath/Z pFolder_Path
       
END

Edit: Changed the file size wave from a double to a long unsigned integer.  I don't think the size will ever be negative or a non-integer.

Instead of Open you could also try:

GetFileFolderInfo/Q/Z/P=pFolder_Path File_Names[iFileDex]
File_Sizes[iFileDex]=V_logEOF

 

The help for GetFileFolderInfo says that V_logEOF is the number of bytes in the data fork, while V_logEOF from FStatus is the total number of bytes in the file, which had always made me think that those values would be different.  However, when I checked several files types (.h5, .png) the size was the same from both methods.  Maybe one of the Wavemetrics folks can chime in.

 

However, using GetFileFolderInfo in the loop is several times slower than using Open (~0.12 s versus ~0.04 s for 80 files).  

I also didn't understand that part, but it is probably fine for finding very different file sizes. But it makes sense that GetFileFolderInfo is slower, since it grabs more info. Better use Open then.

The help for GetFileFolderInfo says that V_logEOF is the number of bytes in the data fork, while V_logEOF from FStatus is the total number of bytes in the file

They are both the number of bytes in the data fork.

The FStatus documentation would be more precise if it said "The number of bytes in the opened fork" which is always the data fork.

The Open operation has always opened the data fork only.

Apple dropped support for resource forks a long time ago so the distinction between data fork and resource fork is moot at this point. 

In case anyone might need, here is a version that checks three things. In my applications, the num of files must be four or more, otherwise, the stack becomes an RGB image. I do not allow mixtures of file types to create a stack. Finally, I check for the same file size.

// input file name list (unparsed) in path imgPath
// return 1 if valid, 0 if invalid
Static Function f_IsValidateforStack(string fList, variable sizecheck)

    variable nf, nt, vRefNum, ic
    string tlist, plist, jlist
    string theFile, fName
   
    // check number of files (stacks must be 4+ images)
    nt = ItemsInList(flist)
    if (nt < 4)
        return 0
    endif
   
    // check file names (no stacks from combinations of image types)
    tlist = ListMatch(fList,"*.tif")
    tlist += ListMatch(fList,"*.tiff")
    nf = ItemsInList(tlist,";")
    nt = nf != 0 ? 1 : 0
   
    plist = ListMatch(fList,"*.png")
    nf = ItemsInList(plist,";")
    nt = nf != 0 ? nt + 1 : nt
   
    jlist = ListMatch(fList,"*.jpg")
    jlist += ListMatch(fList,"*.jpeg")
    nf = ItemsInList(jlist,";")
    nt = nf != 0 ? nt + 1 : nt

    if (nt > 1)
        return 0
    endif
   
    // check file sizes (images must be same sizes)
    if (sizecheck)
        nt = ItemsInList(flist)
        Make/D/N=(nt)/FREE File_Sizes = NaN
        for (ic=0;ic<nt;ic+=1)
            theFile = StringFromList(ic,fList)
            fName = ParseFilePath(0,theFile,":",1,0)
            Open/R/Z/P=imgPath vRefNum as fName
            if (v_flag==0)
                FStatus vRefNum
                File_Sizes[ic]=V_logEOF
            endif    
            Close vRefNum
        endfor     
        Close/A    
        FindDuplicates/FREE/RN=Unique_File_Sizes File_Sizes    
        if (numpnts(Unique_File_Sizes) != 1)
            return 0
        endif
    endif
   
    return 1
end

 

Jeff- I see you allow tiff, png and jpg. If no compression is applied, then the number of bytes in the file will be the same as the number of bytes in the ultimate image. But if any compression is done, then different images may wind up with different file sizes. Especially with jpg, the file size will depend on the quality setting and the amount of high-frequency features in the image. In tiff and png images, I would imagine that large patches of zeroes would compress almost to nothing.

Thanks for the heads up John. I've implemented a restriction to limit the creation of stacks to TIFF images only. As to the possibility of missing the true size differences for compressed TIFFs, only one case will fail in my revised approach. Failure will occur when the individual sizes of each one of a set of TIFF compressed files on the drive are **exactly** the same size but at least one (out of a minimum of four) loaded images is a different uncompressed size than all of the others. I'll take this as an edge case for someone with greater motivation to tackle.

// input file name list (unparsed) in path imgPath
// return 1 if valid, 0 if invalid
Static Function f_IsValidateforStack(string fList, variable sizecheck)

    variable nt, vRefNum, ic
    string tlist, theFile, fName
   
    // check number of files (stacks must be 4+ images)
    nt = ItemsInList(flist)
    if (nt < 4)
        return 0
    endif
   
    // check file names (stacks only allowed from tiff)
    tlist = ListMatch(fList,"*.png")
    tlist += ListMatch(fList,"*.jpg")
    tlist += ListMatch(fList,"*.jpeg")
    nt = ItemsInList(tlist,";")
    if (nt > 0)
        return 0
    endif
   
    // check file sizes (images must be same sizes)
    if (sizecheck)
        nt = ItemsInList(flist)
        Make/D/N=(nt)/FREE File_Sizes = NaN
        for (ic=0;ic<nt;ic+=1)
            theFile = StringFromList(ic,fList)
            fName = ParseFilePath(0,theFile,":",1,0)
            Open/R/Z/P=imgPath vRefNum as fName
            if (v_flag==0)
                FStatus vRefNum
                File_Sizes[ic]=V_logEOF
            endif    
            Close vRefNum
        endfor     
        Close/A    
        FindDuplicates/FREE/RN=Unique_File_Sizes File_Sizes    
        if (numpnts(Unique_File_Sizes) != 1)
            return 0
        endif
    endif
   
    return 1
end

 

If I read this correctly, the only passing cases will be sets of TIFFs with zero compression, the false positive edge case that you mention, or a set of tiffs where the compression fortuitously gives the same file size. Why not check the file header (actually, the Image File Directory/Directories) for the image width(s) and height(s)? That's what you're really trying to check, right?

function GetHeightAndWidth(string strPath)
   
    variable refNum
    if (strlen(strPath))
        Open/R refNum as strPath
    else
        string fileFilters = "TIFF Files (*.tif,*.tiff:.tif,.tiff;)"
        Open/R/F=fileFilters refNum
        Print s_filename
    endif

    if (strlen(s_filename) == 0)
        return 0
    endif
    string strByteOrder = "00"
    int nextDirectory, theAnswer, byteOrder, numEntries, width, height
    int i, j
    int imax = 256 // maximum mumber of images to look for in one file
    int iTag, iType, iCount, iValue, iJunk
   
    FBinRead refNum, strByteOrder
    strswitch (strByteOrder)
        case "II" :
            byteOrder = 3
            break
        case "MM" :
            byteOrder = 2
            break
    endswitch

    FBinRead/U/F=2/B=(byteOrder) refNum, theAnswer // should be 42
    if (theAnswer != 42)
        Close refNum
        DoAlert 0, "could not read file"
        return 0
    endif
   
    // read the Image File Direcory/Directories
    for (i=0;i<imax;i++)
        FBinRead/U/F=3/B=(byteOrder) refNum, nextDirectory
        if (!nextDirectory)
            break
        endif
        FSetPos refNum, nextDirectory
        FBinRead/U/F=2/B=(byteOrder) refNum, numEntries
        width = 0
        height = 0
        // loop though Image File Directory
        for (j=0;j<numEntries;j++)
            FBinRead/U/F=2/B=(byteOrder) refNum, iTag
            FBinRead/U/F=2/B=(byteOrder) refNum, iType
            FBinRead/U/F=3/B=(byteOrder) refNum, iCount

            // for the values we're chasing, iType is 3 or 4 (two or four byte integer)
            // the value should always be found in the IFD, no need to interpret a pointer

            if (iType == 3)
                FBinRead/U/F=2/B=(byteOrder) refNum, iValue
                FBinRead/U/F=2/B=(byteOrder) refNum, iJunk
            else
                FBinRead/U/F=3/B=(byteOrder) refNum, iValue
            endif

            if (iTag == 256)
                width = iValue
            elseif (iTag == 257)
                height = iValue
            endif
        endfor
        Print "width", width, "height", height
    endfor
   
    Close refnum   
end

 

FBinRead/U/F=2/B=(byteOrder) refNum, theAnswer // should be 42
if (theAnswer != 42)
...

LOL, I wonder if this is some deliberate joke by the creators of TIFF.

Thanks Tony. I had thought that I eventually might do a read-only ImageLoad operation to capture the TAGs. Your approach may be less cumbersome. 

The TIFF format is described here

@chozo, the documentation describes it as "an arbitrary but carefully chosen number"

@jjweimer, i edited the snippet to properly handle the case where height and width are encoded as 2 byte integers. I doubt that there are actually any files where this is the case.