\(\newcommand{\AA}{\unicode{x212B}}\)

10. Reading and Writing Data

Larch has several built-in functions for reading and writing scientific data. The intention is that the types and varieties of supported files will increase. In addition, because standard Python modules can be used from Larch, many types of standard data and image types can be used by importing the appropriate Python module. This chapter describes the Larch functions for data handling.

10.1. Simple Plaintext (ASCII or UTF-8) Column Files

A simple way to store small amounts of numerical data - and one that is widely used in the XAFS community - is to store data in plaintext data files, with whitespaces separating numbers layed out as a table, with a fix number of columns and rows indicated by newlines. By “plaintext”, we mean files that are not binary. Many of these will contain only “ASCII” characters (for basic English text without accents or non-Latin characters), but they can also contain some characters representing non-English language, as represented by “Latin-1” or “UTF-8” encodings. Typically a comment character such as “#” is used to signify header information. For instance:

# room temperature FeO
# data from 20-BM, 2001, as part of NXS school
# powder on tape, 4 layers
# 2001-08-10T11:10:00
# Si(111), d_spacing: 3.13553
#------------------------
#  energy       xmu             i0
  6911.7671   -0.35992590E-01  280101.00
  6916.8730   -0.39081634E-01  278863.00
  6921.7030   -0.42193483E-01  278149.00
  6926.8344   -0.45165576E-01  277292.00
  6931.7399   -0.47365589E-01  265707.00

This file has some lines of text which give human readable information about the data collected, and then data for different arrays or channels arranged in columns, with each line or row representing a new data point. While not a very specific description of a data file (see, XDI below), such files are very common in the XAFS community. Such files can usually be read with the builtin read_ascii() function. Which will turn each column into an array, usually named by the column heading. That often means that as you read these files in, you also need to know and tell the program how to use those arrays.

larch.io.read_ascii(filename, labels=None, simple_labels=False, sort=False, sort_column=0)

read a column ascii column file, returning a group containing the data extracted from the file.

Parameters:
  • filename (str) – name of file to read

  • labels (ist or None) – list of labels to use for array labels [None]

  • simple_labels (bool) – whether to force simple column labels (note 1) [False]

  • sort (bool) – whether to sort row data (note 2) [False]

  • sort_column (int) – column to use for sorting (note 2) [0]

Returns:

Group

A data group containing data read from file, with several attributes:

filename : text name of the file.
array_labels : array labels, names of 1-D arrays.
data : 2-dimensional data (ncolumns, nrows) with all data.
header : array of text lines of the header.
footer : array of text lines of the footer (text after the numerical data)
attrs : group of attributes parsed from header lines.

Notes

  1. array labels. If labels is None (the default), column labels and names of 1d arrays will be guessed from the file header. This often means parsing the final header line, but tagged column files from several XAFS beamlines will be tried and used if matching. Column labels may be like ‘col1’, ‘col2’, etc if suitable column labels cannot be guessed. These labels will be used as names for the 1-d arrays from each column. If simple_labels is True, the names ‘col1’, ‘col2’ etc will be used regardless of the column labels found in the file.

  2. sorting. Data can be sorted to be in increasing order of any column, by giving the column index (starting from 0).

  3. header parsing. If header lines are of the forms of

    KEY : VAL
    KEY = VAL

    these will be parsed into a ‘attrs’ dictionary in the returned group.

Examples

>>> feo_data = read_ascii('feo_rt1.dat')
>>> show(g)a
== Group ascii_file feo_rt1.dat: 0 methods, 8 attributes ==
array_labels: ['energy', 'xmu', 'i0']
attrs: <Group header attributes from feo_rt1.dat>
data: array<shape=(3, 412), type=dtype('float64')>
energy: array<shape=(412,), type=dtype('float64')>
filename: 'feo_rt1.dat'
header: ['# room temperature FeO', '# data from 20-BM, 2001, as part of NXS school', ... ]
i0: array<shape=(412,), type=dtype('float64')>
xmu: array<shape=(412,), type=dtype('float64')>

See also

read_xdi, write_ascii

larch.io.write_ascii(filename, *args, commentchar='#', label=None, header=None)

write a list of items to an ASCII column file

Parameters:
  • args (list of groups) – list of groups to write.

  • commentchar (str) – character for comment (‘#’)

  • label (str on None) – array label line (autogenerated)

  • header (list of strings) – array of strings for header

Returns:

None

Examples

>>> write_ascii('myfile',  group.energy, group.norm, header=['comment1', 'comment2']
larch.io.write_group(filename, group, scalars=None, arrays=None, arrays_like=None, commentchar='#')

(deprecated) write components of a group to an ASCII column file

Warning

This is pretty minimal and may work poorly for large groups of complex data. Use save_session instead.

10.2. Reading XAFS Data Interchange (XDI) Files

The X-ray Data Interchange Format has been developed as part of an effort to standardize the format of plaintext XAFS data files: see XDI. This eliminates some of the challenges with plaintext files and allows consistent naming of arrays. XDI files often use the .xdi extension but are also identified by having a first line that includes XDI. These files should be considered to be “the normal way” to read X-ray Absorption Spectroscopy Data into Larch. To read an XDI file with Larch, use read_xdi(). This will create a Group with several consistently named arrays and values that are useful for processing XAS data. A more detailed description is given at XDI metadata dictionary. The most important components read from an XDI file are give in the Table of XDI Attributes below.

larch.io.read_xdi(filename, labels=None)

read an XDI File into a Group

Parameters:
  • filename (str) – name of file to read

  • labels (str or None) – string to use for setting array names [None]

Returns:

Group

A data group containing data read from file, with XDI attributes and conventions.

Notes

  1. See https://github.com/XraySpectrscopy/XAS-Data-Interchange

  2. if labels is None (default), the names of the output arrays will be determined from the XDI column label in the XDI header. To override these array names, use a string with space or comma separating names for the arrays.

Example

>>> from larch.io import xdi
>>> fe3_data = read_xdi('FeXAFS_Fe2O3.001')
>>> print(fe3_data.array_labels)
['energy', 'mutrans', 'i0']
>>> fec3 = read_xdi('fe3c_rt.xdi', labels='e, x, y')
>>> print(fec3.array_labels)
['e', 'x', 'y']

See also

read_ascii

Table of XDI attributes These are the standard names and meanings for arrays and scalars values taken from XDI files.

name

type

meaning

energy

1-D array

X-ray energy in eV

angle

1-D array

rotation angle (degrees) for doube crystal monochromator

i0

1-D array

\(I_0\): measurement (arbitrary units) of incident flux

mutrans

1-D array

\(\mu\) (unitless) for Transmission XAS data

itrans

1-D array

\(I_1\): measurement (arbitrary units) of transmitted flux

mufluor

1-D array

\(\mu\) (unitless) for Fluorescence XAS data

ifluor

1-D array

\(I_f\): measurement (arbitrary units) of fluoresced flux

data

2-D array

raw data from data columns, shape=(narrays, npts)

narrays

integer

number of 1-D arrays

npts

integer

number of points in each 1-D array

dspacing

float

\(d\) spacing (in \(\unicode{x212B}\)) of monoochromator

dspacing

float

\(d\) spacing (in \({\AA}\)) of monoochromator

element

string

atomic symbol for absorbing element

edge

string

symbol for absorbing edge

array_labels

list

list of strings holding labels for narray arrays

array_units

list

list of strings describing units for narray arrays

attrs

dict

dictionary of metadata, using XDI namespaces to nest values

comments

string

additional user-supplied comments from XDI file

xdi_version

string

XDI version

10.3. Athena Project Files

The popular Athena program for XAFS Analysis uses an “Athena Project File” to store many XAFS spectra and some processing parameters as used in Athena. Larch can read and extract the \(\mu(E)\) data from these project files - it does not read \(\chi(k)\) from these files. Larch and can also write \(\mu(E)\) data to Athena Project files.

Larch does not read or support Artemis projects.

10.3.1. Reading Athena Project Files

larch.io.read_athena(filename, match=None, do_preedge=True, do_bkg=True, do_fft=True, use_hashkey=False)

open and read an Athena Project File, returning a group of groups, each subgroup corresponding to an Athena Group from the project file.

Parameters:
  • filename – name of Athena Project file

  • match – string pattern used to limit the imported groups (see Note)

  • do_preedge – bool, whether to do pre-edge subtraction

  • do_bkg – bool, whether to do XAFS background subtraction

  • do_fft – bool, whether to do XAFS Fast Fourier transform

  • use_hashkey – bool, whether to use Athena’s hash key as the group name, instead of the Athena label.

Returns:

group of groups.

Notes:
  1. To limit the imported groups, use the pattern in match, using ‘*’ to match ‘all’, ‘?’ to match any single character, or [sequence] to match any of a sequence of letters. The match will always be insensitive to case.

  2. do_preedge, do_bkg, and do_fft will attempt to reproduce the pre-edge, background subtraction, and FFT from Athena by using the parameters saved in the project file.

  3. use_hashkey=True will name groups from the internal 5 character string used by Athena, instead of the group label.

A simple example of reading an Athena Project file:

larch> hg_prj = read_athena('Hg.prj')
larch> show(hg_prj)
== Group 0x11b001e50: 0 methods, 5 attributes ==
  HgO: <Group 0x1c2e6f48d0>
  HgS_black: <Group 0x1c2e6f49d0>
  HgS_red: <Group 0x1c2e6f4ad0>
  _athena_header: u'# Athena project file -- Demeter version 0.9.26\n# This file created at 2018-06-24T21:55:31\n# Using Demeter 0.9.26 with perl 5.026001 and using Larch X.xx on darwin'
  _athena_journal: [u'Hg 15nM in 50 mM Na Cacodylate (As-containing buffer) ', u'100 mM NaClO4, pH 6.10', u'Hg 15nM in 50 mM Na Cacodylate (As-containing buffer) ', u'100 mM NaClO4, pH 6.10']
larch.io.extract_athenagroup(datagroup)

extracts a group out of an Athena Project File, allowing the file to be closed.

Parameters:

datagroup – group from athena project

Returns:

group with copy of data, allowing safe closing of project file

An example using this function to allow extracting 1 group from an Athena Project would be:

larch> hg_prj = read_athena('Hg.prj')
larch> hgo = extract_athenagroup(hg_prj.HgO)
larch> del hg_prj

10.3.2. Creating and Writing to Athena Project Files

You can create an Athena Project File with create_athena() and then add a group of XAFS data to that pct with the add_group() method of that project file. The group is expected to have array names of energy and i0, and one of mu, mutrans, or mufluor.

larch.io.create_athena(filename)

Open a new or existing Athena Project File, returning an AthenaProject object. That is, a new project file will be created if it does not exist, or an existing project will be opened for reading and writing.

Parameters:

filename – name of Athena Project file

class larch.io.AthenaProject(filename)

A representation of an Athena Project File

AthenaProject.add_group(group, signal=None)

add a group of XAFS data to an Athena Project

Parameters:
  • group – group to be added. See note

  • signal – string or None name of array to use as main signal

if signal is not specified, it will be chosen as mu, mutrans, or mufluor (in that order).

AthenaProject.save(use_gzip=True)

save project to file

Parameters:

use_gzip – bool, whether to use gzip compression for file.

AthenaProject.read(filename=None, match=None, do_preedge=True, do_bkg=True, do_fft=True, use_hashkey=False)

read from project.

Parameters:
  • filename – name of Athena Project file

  • match – string pattern used to limit the imported groups (see Note)

  • do_preedge – bool, whether to do pre-edge subtraction

  • do_bkg – bool, whether to do XAFS background subtraction

  • do_fft – bool, whether to do XAFS Fast Fourier transform

  • use_hashkey – bool, whether to use Athena’s hash key as the group name, instead of the Athena label.

The function read_athena() above is a wrapper around this method, and the notes there apply here as well. An important difference is that for this method the data is retained in the groups attribute which is a Python list of groups for each group in the Athena Project.

AthenaProject.as_group()

Return the Athena Project groups attribute (as read by read()) to a larch Group of groups.

As an example creating and saving an Athena Project file:

larch> feo = read_ascii('feo_rt1.dat', label='energy mu i0')
larch> autobk(feo, rbkg=1.0, kweight=1)
larch> fe2o3 = read_ascii('fe2o3_rt1.xmu')
larch> autobk(fe2o3, rbkg=1.0, kweight=1)
larch> fe_project = create_athena('FeOxides.prj')
larch> fe_project.add_group(feo)
larch> fe_project.add_group(fe2o3)
larch> fe_project.save()

10.3.3. Converting Athena Project Files to HDF5

An Athena Project File (.prj) can be easily converted to HDF5 (.h5) with the athena_to_hdf5().

larch.io.athena_to_hdf5(filename, fileout=None, overwrite=False, match=None, do_preedge=True, do_bkg=True, do_fft=True, use_hashkey=False)

convert read an Athena Project File to HDF5

Parameters:
  • filename – name of Athena Project file

  • fileout – name of the HDF5 file [None -> filename_root.h5]

  • overwrite – bool, whether to overwrite existing outputfile

  • match – string pattern used to limit the imported groups (see Note)

  • do_preedge – bool, whether to do pre-edge subtraction

  • do_bkg – bool, whether to do XAFS background subtraction

  • do_fft – bool, whether to do XAFS Fast Fourier transform

  • use_hashkey – bool, whether to use Athena’s hash key as the group name, instead of the Athena label.

Returns:

None

10.4. Reading HDF5 Files

HDF5 is an increasingly popular data format for scientific data, as it can efficiently hold very large arrays in a heirarchical format that holds “metadata” about the data, and can be explored with a variety of tools. The interface used in Larch is based on h5py, which should be consulted for further documentation.

larch.io.h5_group(filename)

opens and maps and HDF5 file to a Larch Group, with HDF5 Groups map as Larch Groups. Note that the full set of data is not read and copied. Instead, the HDF5 file is kept open and data accessed from the file as needed.

An example using h5_group() shows that one can browse through the data heirarchy of the HDF5 file, and pick out the needed data:

larch> g = h5group('test.h5')
larch> show(g)
== Group test.h5: 3 symbols ==
  attrs: {u'Collection Time': ': Sat Feb 4 13:29:00 2012', u'Version': '1.0.0',
          u'Beamline': 'GSECARS, 13-IDC / APS', u'Title': 'Epics Scan Data'}
  data: <Group test.h5/data>
  h5_file: <HDF5 file "test.h5" (mode r)>
larch>show(g.data)
== Group test.h5/data: 5 symbols ==
  attrs: {u'scan_prefix': '13IDC:', u'start_time': ': Sat Feb 4 13:29:00 2012',
        u'correct_deadtime': 'True', u'dimension': 2,
        u'stop_time': ': Sat Feb 4 13:44:52 2009'}
  environ: <Group test.h5/data/environ>
  full_xrf: <Group test.h5/data/full_xrf>
  merged_xrf: <Group test.h5/data/merged_xrf>
  scan: <Group test.h5/data/scan>


larch> g.data.scan.sums
<HDF5 dataset "det": shape (15, 26, 26), type "<f8">

larch> imshow(g.data.scan.sums[8:,:,:])

This interface is general-purpose but somewhat low-level. As HDF5 formats and schemas become standardized, better interfaces can easily be made on top of this approach.

10.5. Reading NetCDF Files

NetCDF4 is an older and less flexible file format than HDF5, but is efficient for storing array data and still in wide use.

larch.io.netcdf_group(filename)

returns a group with data from a NetCDF4 file.

larch.io.netcdf_file(filename, mode='r')

opens and returns a netcdf file.

10.6. Reading TIFF Images

TIFF is a popular image format used by many cameras and detectors. The interface used in Larch is based on code from Chrisoph Gohlke.

larch.io.read_tiff(fname)

reads a TIFF image from a TIFF File. This returns just the image data as an array, and does return any metadata.

larch.io.tiff_object(fname)

opens and returns a TIFF file. This is useful for extracting metadata and multiple series.

10.7. Working with Epics Channel Access

Many synchrotron facilities use the Epics control system. If the Epics Channel Access layer, which requires network access and configuration discussed elsewhere, are set correcty, then Larch can read and write data from Epics Process Variables (PVs). The interface used in Larch is based on pyepics, which should be consulted for further documentation. The access is encapsulated into three functions:

larch.io.caget(PV_name, as_string=False)

get the value of the Process Variable. The optional as_string argument ensures the returned value is the string representation for the variable.

larch.io.caput(PV_name, value, wait=False)

set the value of the Process Variable. If the optional wait is True, the function will not return until the put “completes”. For some types of data, this may wait for some process (moving a motor, triggering a detector) to finish before returning.

larch.io.PV(PV_name)

create and return an Epics PV object for a Process Variable. This will have get() and put() methods, and allows you to add callback functions which will be run with new values everytime the PV value changes.

10.8. Reading Scan Data from APS Beamlines

This list is minimal, but can be expanded easily to accomodate more facilities and beamlines.

larch.io.read_mda(filename, maxdim=4)

read a binary MDA (multi-Dimensional Array) file from the Epics SScan Record, and return a group based on the scans it contains. This is not very well tested – use with caution!

larch.io.read_gsescan(filename)

read a (old-style) GSECARS Escan data file into a group.

larch.io.read_stepscan(filename)

read a GSECARS StepScan data file into a group.

10.9. Reading Spec/BLISS files via silx.io.open

Spec ASCII files (see spec) and BLISS HDF5 files (see bliss) are read via the silx.io.open module (see silx).

larch.io.read_specfile(filename, scan=None)

Get a Larch group for a given scan number. If scan=None the first scan is returned.

10.10. Reading FDMNES output files

ASCII files from the [FDMNES](http://fdmnes.neel.cnrs.fr/) are read via

larch.io.read_fdmnes(filename)

Return a Larch group

This function is a simple wrapper on top of read_ascii, parsing the header in order to shift the energy scale to absolute values, according to the E_edge variable. The parsed variables are stored in the group.header_dict dictionary.

10.11. Saving and Loading Larch Session Files: .larix Files

A Larch Session File, with a .larix extension, contains all of the user-generated data within a Larch session. All of the data – input data arrays, processed arrays, dictionaries, Journals, etc – from all of Groups, and all of processing parameters, analysis results and fit histories will be included. The Session file will also include a list of all Larch commands executed in the current Larch session (GUI or Command-Line Application), and also include configuration about the session (including versions of Larch and Python, operating system, and so on). Session Files effectively allows you to save your session as a “Project” and be able to share it with someone else or come back to it later, picking up the analysis where you left it. The Session files are meant to be completely portable across different computers and versions.

For portability, Larch Session file is a simple gzipped set of plaintext. JSON is to use to serialize all of the data, including complex and nested Python data structures. While all the data are stored using portable formats and well-supported libraries, it would not necessarily be easy to open and use these files without the Python code in Larch to read these files.

The save_session() function will simply save all the data in the current session. The load_session() function will restore data from a Session file into the current session. On the other hand, read_session() will read the data but not install it analysis session. Instead, it will return a new set of data that you might have to unpack or extract the groups and arrays of interest.

larch.io.save_session(fname=None, _larch=None)

save all groups and data into a Larch Save File (.larix) A portable compressed json file, that can be loaded with read_session()

Parameters:

fname (str) – name of output save file.

See also

read_session, load_session, clear_session

larch.io.read_session(fname)

read Larch Session File, returning data into new data in the current session

Parameters:

fname (str) – name of save file

Returns:

Tuple A tuple wih entries:

configuration - a dict of configuration for the saved session.
command_history - a list of commands in the saved session.
symbols - a dict of Larch/Python symbols, groups, etc

See also

load_session

larch.io.load_session(fname, ignore_groups=None, include_xasgroups=None, _larch=None, verbose=False)

load all data from a Larch Session File into current larch session, merging into existing groups as appropriate (see Notes below)

Parameters:
  • fname (str) – name of session file

  • ignore_groups (list of strings) – list of symbols to not import

  • include_xasgroups (list of strings) – list of symbols to import as XAS spectra, even if not expicitly set in _xasgroups

  • verbose (bool) – whether to print warnings for overwrites [False]

Returns:

None

Notes

  1. data in the following groups will be merged into existing session groups: _feffpaths : dict of “current feff paths” _feffcache : dict with cached feff paths and feff runs _xasgroups : dict mapping “File Name” and “Group Name”, used in XAS Viewer

  2. to avoid name clashes, group and file names in the _xasgroups dictionary may be modified on loading

larch.io.clear_session(_larch=None)

clear user-definded data in a session

Example

>>> save_session('foo.larix')
>>> clear_session()

will effectively save and then reset the existing session.