r11 - 29 Sep 2008 - 12:27:58 - BrunoRinoYou are here: TWiki >  VOTech Web  >  StageFour > StageFourPlanningMeetings > DS5PlanningStage04 > DpSubmitKWmappingDoc

MEx: FITS keyword mapping (documentation)

Rationale

In order to search and find data of interest it is necessary to describe and catalog them in a homogeneous way. The MEx utility is supporting this task for astronomy data products like images and spectra that are stored in FITS format.

MEx extracts and transforms keywords and thereby removes the instrument and observatory signature. This is achieved by converting values to physical units and mapping them to standard vocabularies (UCD) and concepts (utype). Users may supply their own mapping and data models. Special purpose s/w modules can be hooked in where simple mapping expressions are not sufficient to compute the desired values. This makes MEx very flexible and versatile.

Architecture

To achieve the proposed goal in a generic way, usable by any data centre, MEx is split in two components, executed in sequence: keyword mapping and persistence.

The keyword mapping component processes the FITS files, applies the mapping definition to them, and produces an in-memory list of all files and normalised keyword values.

The persistence component persists this list in whatever format/database/etc a specific application might require; a particular data centre will customise this component to meet its data model. Some possible uses are:

  • persisting to a VOTable
  • persisting to an existing database structure
  • persisting into the original FITS files

mapp_arch.gif

Key concepts

package
Set of FITS files to be ingested in one go.
FITS file datatype
Describes the kind of data is present in the file. This is defined by the target model, and reflects the need of different metadata to describe different kinds of data (e.g., Spectral Resolution won't apply to image data).
model item
Some concept that exists on the target data model. A model item is a 'bucket' that is filled with the values extracted from the FITS files by MEx. Different sets of model items will be required for different datatypes.
mapping rule
Definition of how a single value for a given model item is extracted from the FITS files.
target model
The (data) model, as defined by the data centre, into which the mapped values are ingested.

Keyword mapping

Package

For a given run of MEx, a package must be provided.

A package made is of two components:

  • the package file list (sample); for each entry:
    • path to file
    • file datatype
    • association
  • the FITS files

The package file list contains a list of the FITS files, with additional metadata that permit to extract values from the FITS headers.

The file datatype relates to the target data model. It is expected that the target data model will differentiate types of data, and use different metadata to describe them. As such, this datatype is used to determine the required model items.

The association field defines things that go together. For instance, an image present in a FITS file might be accompanied another file containing a weight map.

The required attribute is used to validate that the mapping definition files contain enough rules to meet the data centre requirements. If a model item is required but there is no mapping rule to extract it, the mapping run will fail with an error. Non-required (optional) keywords are extra metadata that the data centre supports, but doesn't require.

Mapping definition

A mapping definition (sample) is a list of mapping rules (see below for syntax discussion), which maps FITS keywords to model items. These mapping rules are related to the file dayatype: they must provide a rule for every required model item. A distinct mapping definition file must be defined for each FITS file with a different structure, i.e, when there are different FITS keywords to retrieve the metadata required by the target model.

Note: Depending on a data centre's requirements, the mapping rules might remain constant (if the FITS files' structure is constant). However the general case does not require it; they might nonetheless be reused across packages.

Configuration

MEx also needs some configuration that remains constant throughout every run of the tool, as they reflect a data centre's requirements. It consists of the model items definition.

The model items definition (sample) lists all existing model items. The model items are characterised in terms of:

  • model item name (utype)
  • ucd
  • unit
  • description

The characterisation also includes "selector" attributes, that define when the model item applies:

  • data type (specifies to which FITS file datatype the model item applies)
  • required (indicates if the model item is required)

The required attribute is used to validate that the mapping definition files contain enough rules to meet the data centre requirements. If a model item is required but there is no mapping rule to extract it, the mapping run will throw an error. Non-required (optional) keywords are extra metadata that the data centre supports, but doesn't require.

Persistence

Persistence of the mapped values is out of the scope of MEx itself. Persistence mappers must be developed by data centres to meet their requirements.

Users' Guide

The mappings

Keyword/value mapping

What: The value is the content of a know FITS keyword.
How: Specify FITS keyword
Example: TWCS.Frame.Equinox = EQUINOX

Constant

What: A value is fixed across all the data to ingest
How: Specify constant value
Example: Curation.MimeType = "application/fits"

Conversion

What: A value is present, but in different unit/formatting than the one expected
How: Add unit/formatting info along with the value; a list of supported conversions must be defined (see below)
Example: Coverage.Temporal.StartTime = {MJD-OBS,MJD}

Arithmetic expressions, string concatenation

What: A value must be calculated
How: Evaluate expressions with simple arithmetic operators (+. -. /, *), and string concatenation (&)
Example 1: Coverage.Region.Spectral.Max = CRVAL1,Angstroem - (CRPIX1,pix-NAXIS1,pix)*{CDELT1,Angstroem/pix}
Example 2: Curation.SoftwareRef = "ESO UVES pipeline" & HIERARCH ESO PRO REC1 PIPE ID

Choice

What: A value exists in one of several keywords
How: A list of candidate keywords is explicitly defined; the first one to be found is used.
Example1: OpticalElements.FILTER = FILTER2||FILTER3 (either FILTER2 or FILTER3 keyword exist on a file)
Example1: BSCALE = BSCALE || 1.0 (1.0 is a default value)

Indirection: Addressing other keys

What: A keyword contains a keyword name to lookup
Example: AverageResolution = >RESOLAVG

Indirection: Addressing other FITS extensions

What: A keyword resides on an extension other than the current one.
How: Specify the extension along with the keyword. Extension 0 will be the primary FITS unit; 1 will be the first extension.
Example: Instrument = 1:INSTRUME

Indirection: Addressing other FITS files

What: A keyword resides on an external FITS file
How: Specify the file name along with the keyword
Example: AverageResolution = "external.fits":0:RESOLAVG

Standard computations

What: Some rather important values are systematicaly absent the header, but can be computed from the data itself and/or other values.
How: A set of standard computations are included in MEx, that are applied based on the model item name.
Example: Spatial.SeeingLimit = %COMPUTE%

Pre-computed table

What: Some values are extraordinarily hard to get, or even missing.

The pre-computed table is a fallback solution when a value is both too hard to get using mapping rules and isn't amenable to be computed by a standard computation.

An example from UVES data: lookup the keyword name which has a value of 'LINE_TABLE_*2' (where * = BLUE, REDL or REDU), replace the last 4 characters 'CATG' with 'NAME', get the value of that keyword, use the value as a filename to retrieve from the ESO archive, and lookup in that file the value of the keyword HIERARCH ESO QC RESOLAVG
How: Such complicated mapping will not be automated, as they are not likely to occur more than once. It will involve user intervention, in the form of an external file, in tsv format, provided with the values for each file (this effectively corresponds to hard-coding the values in this file). This scheme also allows for inclusion of metadata that is not present in any form on the FITS files.
Example: AverageResolution = %EXTERNAL%
The external file (first line contain the concepts to map to):

#filename                               Target.Name     FIELD   SUBID   Instrument.Filter
Deep1c_I_EIS.D1CA.swarp.fits            Deep1c-A        Deep1c  A       I/203
Deep1c_I_EIS.D1CA.swarp.weight.fits     Deep1c-A        Deep1c  A       I/203
Deep1c_R.D1CA.swarp.fits                Deep1c-A        Deep1c  A       Rc/162
Deep1c_R.D1CA.swarp.weight.fits         Deep1c-A        Deep1c  A       Rc/162
Deep1c_V.D1CA.swarp.fits                Deep1c-A        Deep1c  A       V/89
Deep1c_V.D1CA.swarp.weight.fits         Deep1c-A        Deep1c  A       V/89
Deep1c_V.D1CS.swarp.fits                Deep1c-S        Deep1c  S       V/89
Deep1c_V.D1CS.swarp.weight.fits         Deep1c-S        Deep1c  S       V/89
.....

Unit Conversion

For unit conversion we rely on CDS's Unit conversion library.

For details on possible units and formats, please refer to their documentation.

Usage

The keyword mapper component (extractmetadata.py) defines the following command-line parameters:

--itemdef-file model item definitions file
-l file list
-d (optional) path to prefix to the filenames on the file list
-m (optional) mappings file
-e (optional) external keys file
--verbosity (optional) verbosity level
--help (optional) print usage

Persistence components might define extra parameters:

persist_debug.py
-o output file
persist_votable.py
-o output file
--split-output (optional) whether to create separate votable files for each datatype
persist_mysql.py
--host MySQL hostname
--user MySQL username
--password MySQL password
--database MySQL database
--pkgFile global package metadata

Programmer's Guide

Standard computations

Standard computations are rules that apply to most of the data, in a standard way. These are defined by code, as a separate file (compute.py).

Edit the computeMethodDict dictionary. Each entry defines the model items that are to be computed, and the function that will do the computation (a DoCompute function). Since values from other model items are likely to be needed, and might not yet be present (if they are to be computed), if a DoCompute function fails on that account it will be re-executed again after all the other DoCompute functions ran.

Persistence

The keyword mapping component itself is a stand-alone program that simply outputs its results to stdout.

For real use of MEx, one needs to write a persistence component. Persistence components call directly the keyword mapping component, read its in-memory lists of metadata, and output it as they see fit.

persist_debug.py is a simple persistence component that simply dumps the relevant metadata into an html file: the model items definition, the file list, and the mapped values. It is both a good starting point to define a new persistence component and to inspect the structure and contents of the keyword mapping lists.

A typical persistence component will:

  • define extra command-line parameters
  • run the keyword mapping component
  • persist the new values

A persistence component might require configuration of its own. The keyword mapping component defines command-line parameters that all persistence components must abide, and provides a mechanism to easily define extra command-line parameters using the ParseArguments function. persist_debug adds one argument: outputFile, which specifies where the html file will be saved.

The ExtractMetadata runs the keyword mapping component. The returned object contains:

  • itemDefDict: the model item definition
  • fileDesc: the file list
  • fileMD: the mapped values

Albeit the last dictionary is the interesting one, the other two contain information that a data centre might whish to persist.

Installation

MEx was developed and tested under Linux Fedora Core 3. Its components are fairly portable, so it should work on any Unix flavour and Windows. While it is written in Python, it was made Jython-compatible. This means it can either run within a Python or Java runtime.

Python version

Download: Requirements: Extra requirements for the bundled persistence components

Jython version

Download: Requirements: Extra requirements for the bundled persistence components

Sample

The download includes three persistence components:
debug
this module outputs the important in-memory lists produced by the keyword mapping component
votable
output the data as VOTable(s)
mysql
outputs into a MySQL database (DDL file crebas.sql is provided to create the tables)

Two sample packages are provided: UVES and GaBoDs. FITS files are stored under the data folder.

Some scripts are located on the base folder to launch each of the persistence components with the right command line arguments for each package:

  • keyword mapping
    • do_extract_gabods
    • do_extract_uves
  • debug
    • debug_uves
    • debug_gabods
  • votable
    • do_votable_uves
    • do_votable_gabods
  • mysql
    • do_mysql_uves
    • do_mysql_gabods

Advanced features, known issues and next developments

Advanced features

MEx supports some features not discussed in this document:
  • Using FITS files as a transport mechanism. If a FITS file contains ore that one product within, each with its own set of metadata, the HDUs that need to be processed must be specified in the file list.
  • Using different mapping files for different files of the dataset. Most commonly a package will contain only one "main" datatype (one that will have its FITS keywords processed), and as such specifying the mapping rules file as a command line parameter is sufficient. But if that is not the case, a mapping file may be specified in the file list (for each file) rather than as a command line argument. You can even apply different mappings to files with the same datatype!
  • The model item definition is extensible; you can add new columns to it if your persistence component needs more description about model items. The first (comment) line is used as key in the memory representation.

Known issues

  • The unit conversion library doesn't currently support all the conversions needed by the test cases; this should be fixed soon. At present some workarounds are in place. If unit conversion problems are encountered, you might want to try to disable the unit conversion library (with the --without-cds parameter), and use legacy conversion code. Note however that this legacy mode is tuned to work with the test data.
  • The several "Indirection" rules were not thoroughly used, as although they are deemed necessary, the test data does not make use of them. In particular, addressing other FITS files requires a literal string as file name, where it might be more sensible to have a filename based on the current file being processed.
  • Error reporting needs to be better addressed.

Next developments

MEx is still a work in progress; new features, namely new mapping rules types, might be added. Feedback is appreciated.

References/Acknowledgments

The requirements were presented and defined at Stage 3 planning meetings.

MEx is developed at ESO's Advanced Data Products group by Remco Slijkhuis.

Rules requirements and model item definifions set up by Nausicaa Delmotte, Markus Dolensky, Jörg Retzlaff, Bruno Rino, Remco Slijkhuis, Andreas Wicenec.

The unit conversion is powered by CDS: "This software uses source code created at the Centre de Données astronomiques de Strasbourg, France."

Presented at: DS5PlanningStage04.

-- Main.BrunoRino - 03 Sep 2006

toggleopenShow attachmentstogglecloseHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
txttxt filelist_uves.txt manage 2.6 K 04 Sep 2006 - 07:45 BrunoRino sample package
txttxt mappings_uves_utype.txt manage 2.1 K 04 Sep 2006 - 07:45 BrunoRino sample mappings
txttxt modelitem_definitions.txt manage 19.1 K 04 Sep 2006 - 07:49 BrunoRino sample model items
ziptar MEx-demo.tar manage 400.0 K 08 Sep 2006 - 12:33 BrunoRino MEx source and demo files (updated)
ziptar mex.tar manage 780.0 K 20 Mar 2008 - 08:27 BrunoRino  
Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r11 < r10 < r9 < r8 < r7 | More topic actions
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback