read in data (txt file) with special time format

zek
Posts: 2
Joined: 2017-07-10
Location: Switzerland

Dear all,
I try to read in a file like this

Datum Zeit Tmp_ak WGms WR DatumZeit
17.09.2016 09.50.00.02 15.8 1.4 248 17.09.2016 09.50.00.02
17.09.2016 09.50.00.12 15.7 1.3 253.6 17.09.2016 09.50.00.12
17.09.2016 09.50.00.22 15.7 1.3 254.8 17.09.2016 09.50.00.22
17.09.2016 09.50.00.32 15.7 1.4 255.1 17.09.2016 09.50.00.32
17.09.2016 09.50.00.42 15.7 1.3 254.9 17.09.2016 09.50.00.42
17.09.2016 09.50.00.52 15.7 1.3 253.2 17.09.2016 09.50.00.52
17.09.2016 09.50.00.62 15.7 1.3 253.5 17.09.2016 09.50.00.62
17.09.2016 09.50.00.72 15.7 1.3 250.9 17.09.2016 09.50.00.72
17.09.2016 09.50.00.82 15.7 1.3 250.4 17.09.2016 09.50.00.82
17.09.2016 09.50.00.92 15.7 1.2 250.7 17.09.2016 09.50.00.92

First line is the column name; first and second column gives date and time, which I want to merge in one column. The column time has a very special time format. The other columns are temperature, wind speed and wind direction and DateTime.
I tried to load this file with the Data-Loader from Igor Pro, but I has problems with this special time format. Does anyone has a solution or has somebody already some code, which is useful for the read-in of such type of files?

Best regards
Kerstin


Posts: 1693
Joined: 2007-06-21
Location: United States

Here is a solution to the issue.

I recommend saving this as an Igor Procedure file so that it will be automatically loaded when you launch Igor. Execute this for details:
DisplayHelpTopic "Special Folders"

If this does not work with your file then it is probably due to formatting subtleties. In that case, attach a zip archive of your file so I can see exactly what the file contains.

// This loads a file containing data like this:
// 	dd.mm.yyyy<space>hh.mm.ss.ff<space> ...
// where hh.mm.ss.ff is a time of day with fractional seconds. The procedure changes the
// dots in the time to colons so that it works with LoadWave/J, writes a temporary file containing
// the massaged text, and loads the data from the temporary file.
// There is a DeleteFile call at the end of the procedure to delete the temporary file. If you want to
// inspect the temporary file, comment out the DeleteFile call.
 
// In addition to the issue of the formatting of the time, the file has another peculiarity as posted
// at <a href="http://www.igorexchange.com/node/7831" title="http://www.igorexchange.com/node/7831" rel="nofollow">http://www.igorexchange.com/node/7831</a>. It looks like this:
// Datum Zeit Tmp_ak WGms WR DatumZeit
// 17.09.2016 09.50.00.02 15.8 1.4 248 17.09.2016 09.50.00.02
// The column names for the first date/time have a space between them but the column name for the last
// date/time has no space. To sidestep this issue, I am ignoring the column names altogether and
// using LoadWave /B to set the column names.
 
// There is another problem. The use of space to separate the date and time as well as to separate one column
// from the next creates problems with LoadWave. Therefore the procedure replaces the space used to separate
// one column from the next with a tab.
 
Menu "Load Waves"
	"Load Kerstin File...", LoadKerstinFile("", "")
End
 
static Function/S FixText(textIn)
	String textIn
 
	String textOut = ""
 
	Variable numBytes = strlen(textIn)
	Variable bytesLeft = numBytes
	Variable offset = 0
	for(offset=0; offset<numBytes; )
		String ch = textIn[offset]
		Variable isSpace = CmpStr(ch," ")==0
		if (isSpace && bytesLeft>=12)				// Space and sufficient remaining bytes to be hh.mm.ss.ff?
			String section = textIn[offset+1,offset+11]
			String regExp = "[[:digit:]]{2}\.[[:digit:]]{2}\.[[:digit:]]{2}\.[[:digit:]]{2}"	// hh.mm.ss.ff
			if (GrepString(section,regExp))
				section = ReplaceString(".", section, ":")
				textOut += ch + section
				Variable sectionLength = strlen(section)
				offset += 1 + sectionLength					// 1 for ch (space)
				bytesLeft -= 1 + sectionLength
				continue
			endif
		endif
		if (isSpace)
			ch = "\t"								// Replace space separating columns with tab
		endif
		textOut += ch
		offset += 1
		bytesLeft -= 1
	endfor
 
	return textOut
End
 
//	LoadKerstinFile(pathName, fileName)
//	A data file has unwanted line breaks for lines longer than 80 characters
//	This routine creates a temporary version of the file without the bad line breaks
//	and loads data from the temporary file.
Function LoadKerstinFile(pathName, fileName)
	String pathName		// Name of an Igor symbolic path or "".
	String fileName			// Name of file or full path to file.
 
	Variable refNum
 
	// First get a valid reference to a file.
	if ((strlen(pathName)==0) || (strlen(fileName)==0))
		// Display dialog looking for file.
		// Replace /T="????" with, for example, /T=".dat" if your files are .dat files.
		Open /D /R /P=$pathName /T="????" refNum as fileName
		fileName = S_fileName			// S_fileName is set by Open/D
		if (strlen(fileName) == 0)		// User cancelled?
			return -1
		endif
	endif
 
	// Open source file and read the raw text from it into a string variable
	Open/Z=1/R/P=$pathName refNum as fileName
	if (V_flag != 0)
		return -1						// Error of some kind
	endif
	FStatus refNum						// Sets V_logEOF
	Variable numBytesInFile = V_logEOF
	String text = PadString("", numBytesInFile, 0x20)
	FBinRead refNum, text				// Read entire file into variable.
	Close refNum
 
	// Fix the text
	text = FixText(text)					// Remove bad line breaks
 
	// Write the fixed text to a temporary file
	String tempFileName = fileName + ".noindex"	// Use of .noindex prevents Spotlight from indexing the file. Otherwise we get an error when we try to delete the file because Spotlight has it open.
	Open refNum as tempFileName
	FBinWrite refNum, text
	Close refNum
 
	String columnInfoStr = ""
	columnInfoStr += "N=DatumZeitA,F=8;"		// <date><space><time>
	columnInfoStr += "N=TmpAk,F=0;"			// <number>
	columnInfoStr += "N=WGms,F=0;"			// <number>
	columnInfoStr += "N=WR,F=0;"				// <number>
	columnInfoStr += "N=DatumZeitB,F=8;"		// <date><space><time>
 
	// Load the temporary file
	// The /L flag causes LoadWave to load the data starting from line 1 (zero-based), skipping the name line which is problematic.
	// The /B flag specifies the column names and formats.
	// The /R flag specifies the date format as dd.mm.yy.
	LoadWave/J/D/P=$pathName/E=1/L={0,1,0,0,0}/B=columnInfoStr/K=0/R={English,2,2,2,2,"DayOfMonth.Month.Year",40} tempFileName
	if (V_flag == 0)
		Printf "An error occurred while loading data from \"%s\"\r", S_fileName
	else
		Printf "Loaded data from \"%s\"\r", S_fileName
	endif
 
	// Make table bigger
	MoveWindow 10, 50, 800, 400
 
	// Set date format to show fractional seconds
	ModifyTable showFracSeconds[1]=1,digits[1]=2, width[1]=175
	ModifyTable showFracSeconds[5]=1,digits[5]=2, width[5]=175
 
	// Delete the temporary file	
	DeleteFile /P=$pathName tempFileName
 
	return 0
End


zek
Posts: 2
Joined: 2017-07-10
Location: Switzerland

Thanks for your help!
The procedure works, :-).
Not very fast, probably of the big data file.
I attached one of my data file.

Best regards
Kerstin

AttachmentSize
WindTestFile.zip7.46 MB

[ last edited July 14, 2017 - 01:32 ]
_sk
Posts: 115
Joined: 2014-10-28
Location: Switzerland

zek wrote:
Dear all,
I try to read in a file like this

Datum Zeit Tmp_ak WGms WR DatumZeit
17.09.2016 09.50.00.02 15.8 1.4 248 17.09.2016 09.50.00.02
17.09.2016 09.50.00.12 15.7 1.3 253.6 17.09.2016 09.50.00.12
...

Why do you have duplicate columns in the data file? (Datum, Zeit) in the beginning, (Datum, Zeit) at the end? Each line begins and ends with this tuple. For the file that you posted if you remove the last two columns, the reduction is substantial: 70173015 bytes > 42524260 bytes (~67MB > ~41MB).

Under a linux or cygwin environment you'd do something like:
cat wind_test_file.txt | awk '{print $1,$2,$3,$4,$5}' > wind_test_file_out

Another recommendation is to change the delimiter to a comma. Then, use the Data > Load Waves > Load delimited text... menu, to load the data keeping the time as a text. Parse it after loading.

Under linux or cygwin in vim you'd issue the following to substitute space for comma:
:1,$s/ /,/g

Another recommendation is to combine the date and time into one variable ala julian date, unix time, etc. This will provide further savings on the file size, but will increase processing upon file loading.

edit: As a matter of fact, you don't need the date. It's a waste of bytes. Tell the person generating these files to name the filenames with the corresponding date, or put it in the beginning of each file as a header or something, and dump data only for this particular day in this particular file. I would also use fractional time. Even though a delta t would be enough (which btw is not precisely 100ms, just a tiny bit less, ~99.99(93)ms)

best,
_sk


[ last edited July 14, 2017 - 05:24 ]

Back to top