# Data Input

# Data Structure Types

Prior to importing data, the spectral data has to be exported from the device’s manufacturer software in one of the supported formats:

  • *.spc data files with single or multiple Raman spectra (tested with WITec and Renishaw devises). May - fail to parse the date automatically, so the date should be provided in metadata.

  • *.txt files with single spectra. The files should contain wavenumbers in a first column and the spectral intensities in the second column. The columns should be separated with a tab stop symbol.

  • *.txt files with multiple spectra (time series and scans). Either of two structures can be used:

    • Generic (e.g. WITec Table export) format with spectra in columns (preferably) or rows. The first column/row should contain wavenumber axis. No additional columns/rows or column/row names are allowed.
    • Format compatible with files exported from Renishaw. Spectrum indexes, timestamp, or coordinates should be indicated in columns for each wave-number index of each spectrum; wavenumbers and the respective spectral in-tensities should be indicated in two columns named “#Wave” and “#Intensity”. The columns should be separated with a tab stop symbol.
    • spectra in rows, spectrum index or timestamp as row names, and wavenumbers as column names. The columns should be separated with a tab stop symbol; *.spc data files with single or multiple Raman spectra (tested with WITec and Renishaw devises).*.lpe data files (native rapID BPE format) with multiple spectra;
  • *.csv file with multiple spectra in rows, without row names. The wavenumbers have to be specified as a respective column names. The file also should contain a mandatory column DateTime with a timestamp (YYYY-MM-DD hh:mm:ss).

# Supported Devices

General instruction:

  1. Export data using the software coupled to your Raman device.
  2. Structure/restructure your data to simplify the metadata file construction.
  3. Create a metadata file.
  4. Compress data and metadata to one *.zip file.

WARNING

! X-axis Units cm-1

! When dates are not identified automatically by the software, the metadata file must contain the “date” column. (see Providing Metadata)

Below is a list of Raman devices which are supported by the software or used by active users. We describe here how to export data from Raman device manufacturer software in a way that it will be readable by Ramamnmetrix.

# BPE (Bio Particle Explorer from rap.ID)

File format: *.lpe files - spectra in text format are compressed along with metadata and images.

Each spectrum: .txt file where data saved as a Table with two columns of data: wavenumber and spectral intensity. The timestamp is internally included in the filename (SpRaw_ YYMMDD_hhmmss.txt) and the dates will be automatically identified by the software.

Extract files using a program for unzipping archives or R script OpenLPE.R (contact Julian Hniopek).

# Witec (Project Four software)

Extract files using Spc or Table export options in the Project Four software.

“Spc” and “Table” export options work for single spectrum, time series and scans.

  • Export --> SPC

Format: Single spectrum data, time series and scans data structure types can be exported as *.spc file.

  • Export --> Table

Format: table with columns, first column – wavenumber, next columns – intensity (each column contains single spectra). Should be exported with a default Filter options: • Column Delimiter: “Tab” • Decimal Separator: “. “

  • Export --> Matlab, JCAMP-DX, Graph ASCII - are not supported.

# Renishaw (Wire software)

Extract files using Spc or Txt export options in the Renishaw’s Wire software.

  • Export --> SPC

Format: Single spectrum data, timeseries, and scans data structure types can be exported as *.spc file

  • Export --> As txt

Format:

Single spectrum: table with columns: first column – wavenumber, second – intensity

Time series: table with three columns of data including their names (#Time, #Wave, #Intensity)

Scan: 4 columns including their names (#X, #Y, #Wave, #Intensity) Please, check that txt file is exported with:

  • Column Delimiter “Tab”

  • Decimal Separator “. “

# Horiba (Labspec software)

  • Export --> SPC

Format: Single spectrum data, timeseries, and scans data structure types can be exported as *.spc file

  • Table format

Format: table with spectra in rows (without row names): First row (or column names) should contain the wavenumber data

# Kaiser Raman (Holograms software)

Data should be exported as *.spc files.

# HT Raman (IPHT, Iwan Schie)

Data is saved as separated files with Y-axis. No X-axis file. Files have a .csv extension and columns separated with a “Tab”.

This data format is not readable by the software and should be converted into one of the supported formats prior to import into RAMANMETRIX. Please contact the developer team for more details.

# Providing Metadata

There are two ways to provide metadata: folder structure and a metadata table. Both can be use simultaneously, but values from metadata table overwrite the labels obtained from the folder structure.

The metadata are needed for calibration, cross-validation, predictive modeling, and data visualization. For example, standard spectra are linked to the sample spectra both by the device name and by the dates from the timestamps.

Intensity calibration and dark background subtraction can only be performed if respective measurements are specified in the metadata table.

Prior to importing the data, the files with spectral data and, optionally, the metadata files should be packed into a ZIP archive.

# Metadata Table

The metadata should be provided in the form of a table, saved as a CSV, XLS, or XLSX file within the ZIP file with data. Multiple metadata files are allowed (e.g. one per replicate). The filename of metadata file should always contain the word “metadata”.

The default columns (case-sensitive names) are:

  • file: starting pattern of the file path within the ZIP file. Either full path to the file or a starting pattern of the full path should be specified. If am-biguous patterns are specified, then the longer pattern has a priority (e.g. values in row “folder/subfolder/” will overwrite metadata for the spectra in the subfolder, even if different metadata are set for the “folder/”). If no metadata is provided, then “file” column will be identical to filename and metadata are generated from the folder structure.
  • standard: logical (True or False values). “True” value indicates the files with wavenumber calibration standard. All other spectra should be la-beled as “False”.
  • standard_intensity: logical. Indicates the files with intensity standard spectra. The column may be skipped if no intensity standard was meas-ured.
  • dark_bg: logical. Indicates the files with dark background spectra. The column may be skipped if no dark background was measured.
  • reference_sample: logical. Indicates the files that should only be used as reference EMSC and quality filters based on correlation. If the column is skipped, mean spectrum over the dataset is used as the reference.
  • date: dates of measurements. Used at wavenumber and (optionally) intensity calibration steps, as data from each date is calibrated separately. Should be specified if the dates cannot be detected correctly from files (e.g. *.lpe,*.csv, or some *.spc file formats) and folder structure (timestamps YYMMDD_hhmmss in file/folder names).
  • batch: default data segments for batch-out cross-validation (sets of independent measurements).
  • type: default class labels (the grouping that has most relevance).
  • device: measurement device name, id, or label. Used at calibration steps, as data from each device is calibrated separately. Should be speci-fied if cannot be detected correctly from folder structure (the top folder name within the imported *.zip file). Other additional columns can be used as responses for the regression, alternative grouping for the classification and cross-validation, or to define how the spectra should be aggregated.

# Metadata fo the calibration

The “date” column should contain the measurement date. If the column is missing, the timestamps from the folder structure or the dates embedded into the data object (.lpe,.csv, or some *.spc file formats) are used.

Both “date” and “device” labels are needed to perform spectral calibration.

Columns “standard”, “standard_intensity”, and “dark_bg” should contain Boolean values (true or false). The true values in the respective columns indicate the measurements of standard materials for wavenumber calibration, for intensity calibration, and dark background subtraction, respectively.

# Step-by-step instruction

  1. Open MS Excel and name the columns “file”, “standard”, “device”, “date”, “type”, and “batch”.

    Image from alias

    TIPS

    a. You may add other columns with important information about the samples and use the order of the columns that is convenient for you.

    b. If the intensity standard or dark current were measured, then the columns “standard_intensity” and “dark_bg” (respectively) should be added to the table and used in a way similar to the “standard” column.

  2. Copy the path to the data files within the ZIP file to the metadata table and fill out the other columns of the metadata table. For the sample data set value in “standard” column to FALSE.

    Image from alias

  3. Copy the path to the standard data, set value in a “standard” column into “TRUE”, specify the device and the date of measurement in the respective columns.

    Image from alias

  4. Repeat the process for each folder and save the file with the word “metadata” in the filename.

    Image from alias

  5. Put the metadata file inside the ZIP file (e.g. using “Drag and drop” method).

    Image from alias

# Defined from the folder structure

Besides providing labels as a separate metadata file, the files might be organized into a specific folder structure that represents labels:

  • Files with single spectra have to be placed in the folders of the following structure: “$device/.../$type/$batch/$SingleSpectrumFile.txt”

  • Files with multiple spectra are considered as batches, so no additional folder for batch-es should be created. Instead, they should be placed in: “$device/.../$type/”

  • Standard spectra (or folders with standard spectra) should be placed in subfolders which name equals to “AAP” or contains “4-AAP” or “4AAP”. These subfolders may be placed anywhere inside the “$device” folder.

The device name is specified by the top folder within the ZIP file. The timestamps are embedded in the .spc,.lpe, and *.csv files.

The pattern YYMMDD_hhmmss in file and folder names is interpreted as timestamp. For example, the timestamp “190530_121519” will be taken from the file, path to which is “data_0/e12300123_102030/s19161010_151515/May30batch2/Sp_20190530_121519.txt”.

If no info about measurement date is detected, then the batch name is used to link standard data to the sample data. In this case, the standard spectra should be placed in folders named according to the batch/replicate labels inside the folders with standard data (for example in “data_0/AAC/May30batch2/”).

If the dates cannot be detected correctly from folder structure or file content, the date should be specified in the metadata file.

When the standard spectra are not available or cannot be linked to the data correctly using the date and device name, the data are approximated onto the new wavenumber axis without calibration.

TIP

It is preffered to provide the metadata table.