XAS schema for NeXuS based on XDI
As discussed in the Multi-Spectra Data Formats, using HDF5+NeXuS as a format for sharing multiple XAS spectra is worth exploring. According to the criteria given there, this format should map well to XDI, as the conclusions for required metadata tags and array names was the result of considerable discussion by the working groups and XAS community.
As it turns out, there is an existing XAS schema defined for NeXuS (see https://github.com/nexusformat/definitions/blob/main/applications/NXsas.nxdl.xml), though as far as I can tell it was not written by someone in the XAS community and is not used.
A candidate revised schema is included in Pull Request at https://github.com/nexusformat/definitions/pull/1293. This proposed schema is deliberately based on and shares many concepts and names with XDI. An annotated description of the schema for each XAS group in the NeXus file is given in the table below.
Table of fields and meaning in the proposed NeXuS NXxas definition:
Subgroups are all labeled, and all other values hold datasets. Several of the datasets under
data
will be links to other datasets, and are indicated with ‘->’. Datatypes described as Arrays are expected to be 1-D arrays unless otherwise noted (here,scan/data
anddata/rawdata
). This list need not be complete, and additional datasets can be added to any Group without loss of existing information. In addition, each dataset can have attributes, for example giving units ore additional metadata.
address
description
definition
string, ‘NXxas’ identifying Group type
start_time
time for start of scan (ISO 8601)
title
string for title of data
data
Group containing primary data
data/column_labels
-> /scan/column_labels
data/edge
-> /scan/xrayedge/edge
data/element
-> /scan/xrayedge/element
data/energy
-> /instrument/monochromator/energy
data/i0
-> /instrument/i0/data
data/itrans
-> /instrument/itrans/data
data/ifluor
-> /instrument/ifluor/data
data/irefer
-> /instrument/irefer/data
data/rawdata
-> /scan/data
data/mode
string measurement mode (‘Transmission’)
data/mutrans
\(mu(E)\) for Transmission
data/mufluor
\(mu(E)\) for Fluorescence
data/murefer
\(mu(E)\) for Reference description
data/column_labels
-> /scan/column_labels
instrument
Group for instrument data
instrument/monochromator
Group for monochromator
instrument/monochromator/energy
Array of energy values
instrument/monochromator/angle
Array of angle values
instrument/monochromator/crystal
Group for mono crystal
instrument/monochromator/crystal/chemical_formula
string for mono crystal (eg, ‘Si’)
instrument/monochromator/crystal/d_spacing
d-spacing (in Ang) for reflection
instrument/monochromator/crystal/reflection
string crystal reflection (eg, ‘1,1,1’)
instrument/i0
Group for i0 detector
instrument/i0/data
Array of i0 values
instrument/itrans
Group for transmission detector
instrument/itrans/data
Array of transmission values
instrument/fluor
Group for fluorescence detector
instrument/fluor/data
Array of fluorescence values
instrument/refer
Group for reference detector
instrument/refer/data
Array of reference values
instrument/source
Group for Source (facility, beamline)
instrument/source/beamline_name
string name of beamline
instrument/source/facility_name
string name of facility
instrument/source/probe
string for source probe (‘X-ray’)
sample
Group for Sample information
sample/name
string name of sample
sample/prep
string description of sample prep
scan
Group for Scan information
scan/xrayedge
Group for Element and Edge of Scan
scan/xrayedge/element
string of element symbol for scan
scan/xrayedge/edge
string of edge probed
scan/edge_energy
nominal edge energy
scan/data
2D (nCol x nP) raw scan data table
scan/nCol
integer number of columns in scan/data
scan/nP
integer number of rows in scan/data
scan/column_labels
array of column labels for scan/data
scan/scan_mode
string describing scan mode
Note that many of the arrays and datasets listed in the table above are optional. That is, it is not
required to have a irefer
or ifluor
or even itrans
array. But, it is expected that at
least one of ifluor
or itrans
is given, and if a reference channel is to be included, it
should be called irefer
. Arrays for mutrans
, mufluor
, and/or murefer
are recommended
if those intensity arrays are included, but are not required.
Communicating the monochromator energy
in eV (or keV) is recommended. Giving the monochromator
in angle
(in degrees) is acceptable,
Worked example of XDI and NeXuS formatted XAS data
Examples of files using the proposed schema are given at https://millenia.cars.aps.anl.gov/nxxas/. This includes a number of the example XDI files from https://github.com/XraySpectroscopy/XAS-Data-Interchange/tree/master/data. The raw sources for these files, and the Python script for creating these files are at https://millenia.cars.aps.anl.gov/nxxas/ASCII_Sources/.
The individual single spectrum files from the XDI sources are at https://millenia.cars.aps.anl.gov/nxxas/SingleSpectrumFiles/.
An example of a multi-spectrum file with 26 spectra is given at https://millenia.cars.aps.anl.gov/nxxas/MultiSpectrumFiles/. For comparison, an plain Zip file of the original as-measured files is included as well as an Athena project file for these datasets and a Larch session file for these data files.
Discussion
The example datasets linked to above are meant to give a proof-of-concept and be the start of discussion for how to format XAS data with NeXuS and HDF5. It is also perfectly reasonable to revisit both the other formats described in Multi-Spectra Data Formats and the criteria outlined there. I think that “Zip file of XDI files” is definitely worth considering.
No matter what format(s) are adopted for databases, supplemental material for journals and downloadable
archives of data from facilities FAIR efforts, it seems to me that “Raw” or “as measured data” –
even from my own beamline – does probably need some conversion step to be useful to others. For the
“as measured” Fe K-edge datasets at https://millenia.cars.aps.anl.gov/nxxas/ASCII_Sources/, it is just
not clear that the file FeFoil.001
contains data that should be used as transmission XAS while all
the other Fe_*.001
and Fe_*.002
files contain fluorescence data that should use the array
labeled Sum_Fe_Ka_counts
as the fluorescence data that is already the sum of 7 deadtime-corrected
channels.
Using the NeXuS format as an exchange format has the advantage of requiring the formatting step.