XAS schema for NeXuS based on XDI

As discussed in the Multi-Spectra Data Formats, using HDF5+NeXuS as a format for sharing multiple XAS spectra is worth exploring. According to the criteria given there, this format should map well to XDI, as the conclusions for required metadata tags and array names was the result of considerable discussion by the working groups and XAS community.

As it turns out, there is an existing XAS schema defined for NeXuS (see https://github.com/nexusformat/definitions/blob/main/applications/NXsas.nxdl.xml), though as far as I can tell it was not written by someone in the XAS community and is not used.

A candidate revised schema is included in Pull Request at https://github.com/nexusformat/definitions/pull/1293. This proposed schema is deliberately based on and shares many concepts and names with XDI. An annotated description of the schema for each XAS group in the NeXus file is given in the table below.

Table of fields and meaning in the proposed NeXuS NXxas definition:

Subgroups are all labeled, and all other values hold datasets. Several of the datasets under data will be links to other datasets, and are indicated with ‘->’. Datatypes described as Arrays are expected to be 1-D arrays unless otherwise noted (here, scan/data and data/rawdata). This list need not be complete, and additional datasets can be added to any Group without loss of existing information. In addition, each dataset can have attributes, for example giving units ore additional metadata.

address

description

definition

string, ‘NXxas’ identifying Group type

start_time

time for start of scan (ISO 8601)

title

string for title of data

data

Group containing primary data

data/column_labels

-> /scan/column_labels

data/edge

-> /scan/xrayedge/edge

data/element

-> /scan/xrayedge/element

data/energy

-> /instrument/monochromator/energy

data/i0

-> /instrument/i0/data

data/itrans

-> /instrument/itrans/data

data/ifluor

-> /instrument/ifluor/data

data/irefer

-> /instrument/irefer/data

data/rawdata

-> /scan/data

data/mode

string measurement mode (‘Transmission’)

data/mutrans

\(mu(E)\) for Transmission

data/mufluor

\(mu(E)\) for Fluorescence

data/murefer

\(mu(E)\) for Reference description

data/column_labels

-> /scan/column_labels

instrument

Group for instrument data

instrument/monochromator

Group for monochromator

instrument/monochromator/energy

Array of energy values

instrument/monochromator/angle

Array of angle values

instrument/monochromator/crystal

Group for mono crystal

instrument/monochromator/crystal/chemical_formula

string for mono crystal (eg, ‘Si’)

instrument/monochromator/crystal/d_spacing

d-spacing (in Ang) for reflection

instrument/monochromator/crystal/reflection

string crystal reflection (eg, ‘1,1,1’)

instrument/i0

Group for i0 detector

instrument/i0/data

Array of i0 values

instrument/itrans

Group for transmission detector

instrument/itrans/data

Array of transmission values

instrument/fluor

Group for fluorescence detector

instrument/fluor/data

Array of fluorescence values

instrument/refer

Group for reference detector

instrument/refer/data

Array of reference values

instrument/source

Group for Source (facility, beamline)

instrument/source/beamline_name

string name of beamline

instrument/source/facility_name

string name of facility

instrument/source/probe

string for source probe (‘X-ray’)

sample

Group for Sample information

sample/name

string name of sample

sample/prep

string description of sample prep

scan

Group for Scan information

scan/xrayedge

Group for Element and Edge of Scan

scan/xrayedge/element

string of element symbol for scan

scan/xrayedge/edge

string of edge probed

scan/edge_energy

nominal edge energy

scan/data

2D (nCol x nP) raw scan data table

scan/nCol

integer number of columns in scan/data

scan/nP

integer number of rows in scan/data

scan/column_labels

array of column labels for scan/data

scan/scan_mode

string describing scan mode

Note that many of the arrays and datasets listed in the table above are optional. That is, it is not required to have a irefer or ifluor or even itrans array. But, it is expected that at least one of ifluor or itrans is given, and if a reference channel is to be included, it should be called irefer. Arrays for mutrans, mufluor, and/or murefer are recommended if those intensity arrays are included, but are not required.

Communicating the monochromator energy in eV (or keV) is recommended. Giving the monochromator in angle (in degrees) is acceptable,

Worked example of XDI and NeXuS formatted XAS data

Examples of files using the proposed schema are given at https://millenia.cars.aps.anl.gov/nxxas/. This includes a number of the example XDI files from https://github.com/XraySpectroscopy/XAS-Data-Interchange/tree/master/data. The raw sources for these files, and the Python script for creating these files are at https://millenia.cars.aps.anl.gov/nxxas/ASCII_Sources/.

The individual single spectrum files from the XDI sources are at https://millenia.cars.aps.anl.gov/nxxas/SingleSpectrumFiles/.

An example of a multi-spectrum file with 26 spectra is given at https://millenia.cars.aps.anl.gov/nxxas/MultiSpectrumFiles/. For comparison, an plain Zip file of the original as-measured files is included as well as an Athena project file for these datasets and a Larch session file for these data files.

Discussion

The example datasets linked to above are meant to give a proof-of-concept and be the start of discussion for how to format XAS data with NeXuS and HDF5. It is also perfectly reasonable to revisit both the other formats described in Multi-Spectra Data Formats and the criteria outlined there. I think that “Zip file of XDI files” is definitely worth considering.

No matter what format(s) are adopted for databases, supplemental material for journals and downloadable archives of data from facilities FAIR efforts, it seems to me that “Raw” or “as measured data” – even from my own beamline – does probably need some conversion step to be useful to others. For the “as measured” Fe K-edge datasets at https://millenia.cars.aps.anl.gov/nxxas/ASCII_Sources/, it is just not clear that the file FeFoil.001 contains data that should be used as transmission XAS while all the other Fe_*.001 and Fe_*.002 files contain fluorescence data that should use the array labeled Sum_Fe_Ka_counts as the fluorescence data that is already the sum of 7 deadtime-corrected channels.

Using the NeXuS format as an exchange format has the advantage of requiring the formatting step.