The informations on this page reflect the first experiments. For up-to date information, please refer to the new tools.
Scope and goal of SED construction
Scope
SED (Spectral Energy Distribution) construction provides associations between wavelength and intensity of emission for objects in the sky. Such constructions make use of tabular data from various origins. These data can be found in VO resources such as the VizieR catalogues.
SEDs are of great interest to astronomers: in order to fully understand the physics of astronomical objects, it is mandatory to exploit information distributed in different wavelength/frequencies domain.
We need, as a first step, to be able to find relevant VO resources according to some restrictions like "I want all resources referring to radio emission". Then, we also need to be able to extract relevant data from the obtained sources. As we will see in the next part the VO registry plays an important role in all those tasks.
The need for a registry
A VO registry contains metadata about VO resources (these metadata are exposed in XML format). Thus, each VO resource is unambiguously identified by a piece of metadata called "identifier" (for example the VizieR main table of the Hipparcos catalog is identified by "ivo://CDS/VizieR/I/239/hip_main"). Other metadata like
"description",
"source" or
"referenceURL" allows one to have more informations about it and to know how to get or visualize the resource. Each resource also has a
type. For example the VizieR tables currently are resources having the "TabularSkyService" type. Such resources have particular metadata describing the columns of the tabular resource. Here is an example for the main table of Hipparcos catalog:
<vs:Resource>
...
<vs:table xmlns="http://www.ivoa.net/xml/VODataService/v0.5">
<vs:name>I/239/hip_main</vs:name>
<vs:description>The Hipparcos Main Catalogue\vizContent{timeSerie}</vs:description>
<vs:column>
<name>recno</name>
<description>Record number within the original table (starting from 1)</description>
<unit/>
<ucd>meta.record</ucd>
</vs:column>
<vs:column>
<name>HIP</name>
<description>Identifier (HIP number) (H1)</description>
<unit/>
<ucd>meta.id;meta.main</ucd>
</vs:column>
...
</vs:table>
</Resource>
All the columns of the table are listed and important information for each one can be found. To filter resources having criterias like "Positions must be present" or "Radio emission data must be present" we will use the UCD metadata of the columns. Indeed the UCD tells us what kind of information is stored in the columns and so we can know if it's a position, a radio emission or something else.
Our goals
Our first goal is to exploit the registry to filter resources according to some criteria on UCDs like "We want all resources having positional information". We then want to automatically extract relevant data from the obtained resources so that SED construction can be done. We tried to comply as far as possible with existing VO standards in accomplishing those tasks. This should allow such a SED construction tool to be used by the wide VO community.
Infrastructure setup
We decided to work on a local registry, in order to be able to customize it for our needs if necessary. We chose eXist (see next part for more information) to host the data and so getting a full operational registry.
eXist
eXist (
http://exist.sourceforge.net) is an native Open Source XML database entirely written in java. For more information about it see
eXist.
Registry population and migration to UCD1+
Once eXist was locally installed, we needed to populate it with the data of a VO registry. We decided to retrieve the data of the full Carnivore
NVO registry because it uses eXist as XML database and so it was very easy to retrieve the data and import it locally on our eXist database. The full registry has a size of 70 Mo and is composed of more than 11.000 VO resources (one XML file per resource).
One problem we met is that UCD metadata were in UDC1 format whereas we wanted to work with the future UCD1+ format. So we wrote a small java tool that reads the UCD1 resources metadata and translate it to UCD1+ resources metadata. To perform this we used a UCD Web Service taking a UCD1 as an argument and returning the corresponding UCD1+. This provided us with a full local registry containing UCD1+ metadata.
Benchmarks
Before developing a tool that makes use of a registry we wanted to be sure that such a thing was feasible. Indeed, to find relevant VO resources we need to make deep searches on the XML metadata of the registry. If such operations could not be achieved in a reasonable time our project would be stillborn. So we made several benchmarks on our local registry. These benchmarks consist of making XQuery (XQuery is a
W3C language used to make advanced requests on XML fragments - eXist supports it) requests on the registry and checked how much time and memory it took to complete. Here are some results:
Test 1: recovery of one single resource having a particular identifier
//vr:Resource[vr:identifier='ivo://CDS/VizieR/J/A+AS/123/575/levels3']
- duration: 2s
- memory: 15 Mo
- proc: 100%
Test 2: recovery of the resources having at least one column having a particular UCD
//vr:Resource[vs:table/vs:column/vs:ucd='phot.mag;arith.diff']
- duration: 1s
- memory: 15 Mo
- proc: 100%
Test 3: recovery of the resources having at least one column having a particular UCD or an other
//vr:Resource[vs:table/vs:column/vs:ucd="phot.mag;arith.diff" or vs:table/vs:column/vs:ucd="phot.color;em.opt.B;em.opt.V"]
- duration: 3s
- memory: 120 Mo
- proc: 100%
First thing to say is that the searches on the XML registry work. We can get relevant resources according to restrictions on UCDs. Because the time and memory used seemed reasonable, we proceeded with the development of a SED construction tool making use of the registry.
Development of a SED construction tool
Introduction
To achieve our goals we began to develop a java tool that will allow astronomers to find relevant VO resources according to some restrictions on UCDs and then to extract data from the obtained resources, thanks to the registry. We decided to use the java language for several reasons. Java is portable and so is a good solution for a tool intended to be used by astronomers from all the VO community and whose diffusion should be easy. What's more with java we can easily work with XML and XML database (and therefore with our trial registry implementation).
Registry metadata
Before understanding what we can do with our tool and how, one must know what kind of metadata can be found in the registry and how to exploit it.
The first kind of metadata you can meet for a VO resource is "descriptive metadata". Here are the most important ones:
- identifier: unambiguous identification of the VO resource ("ivo://CDS/VizieR/VIII/16/mrcj2000" for example)
- creator: the creator(s) of the resource ("Large M.I., Cram L.E., and Brugess A.M." for example)
- description: small text describing the content of the VO resource
- referenceURL: URL pointing to a human-readable document describing the VO resource
- coverage: the spatial, spectral and temporal coverage of the data
There are also metadata about the content of tabular resources ("TabularSkyService" type). These metadata describe the columns of the tabular resource, providing for each column several pieces of information:
- name: the name of the column ("_GLON_" for example)
- description: a brief description of the column ("[0,360] Galactic longitude" for example)
- ucd: tells what kind of information can be found in this column ("pos.galactic.lon" for example)
- unit: tells in which unit information is stored ("deg" for example)
Such metadata allow us to filter the tabular VO resources according to its content described by the UCDs.
The last kind of metadata that can be found are metadata about the interfaces of the VO resource. An interface allows an astronomer or a tool to view the resource and get the data. There are several
types of interfaces:
- "WebBrowser": allows one to interact with the resource via a Web browser. This interface lets you choose what columns you want to see and from what columns you want to extract data.
- "ParamHTTP": allows one to extract data from the resource by calling an URL in a Web browser. The difference with the "WebBrowser" type is that here one must specify the parameters of the extraction (output format, wanted columns...) in the URL. No user interface is available.
- "GLUService": a web-based service that is described in a GLU registry.
- "WebService": service described by a WSDL.
With such a schema we can easily filter VO resources according to restrictions on UCDs by browsing the columns metadata of each resource and testing whether the wanted UCDs are present or not. But when we want to have normalized access to the obtained resources using its interfaces this schema seems to be limited. Indeed, there is no separation between metadata about content of a tabular resource and metadata about its interfaces.
Prototype for advanced Resource description
The current registry implementation has a few limitations, which are easily seen in the case of VizieR-related resources (because there are so many tables in VizieR).
Consider 5 tables (A, B, C, D, E), and 3 different possible access methods to these tables (FTP access, query form in a web browser and Cone Search). VOResource v0.10 would suggest to create 5x3=15 Resources to describe the various combinations, each containing a single interface for one table. There are some drawbacks to this description:
- the base URL for, e.g., FTP access will be duplicated in each resource, therefore a change in this base URL will require updating many resources.
- the "table" part, describing the contents of the table, will be duplicated for all the resources corresponding to different access methods to this table.
We therefore tentatively tested a new resource description method, beyond VOResource v.0.10, hoping that lessons learned from this prototyping could be used in the current modeling effort in the
IVOA registry group.
The adopted strategy was the following :
- we create one resource for each main access method to VizieR tables. The type of these resources (ParamHTTP, WebService) is attached to the Resource. Each of these (few) resources has a unique ivo:// identifier. These resources contain an "interface", with a description of generic parameters specific to the access mode, and possibly "capability".
- each VizieR table has a corresponding Resource, of type xsi:type="DataSet", containing Curator, Content, Coverage, and Table description. In addition, every table indicates how it can be accessed with a set of "Interface" elements, each pointing to a specific resource corresponding to an access method possibly adding some parameters. These parameters are specific to the table, and can be used in combination with the generic parameters already present in the pointed access method resource
Example of what a set of interfaces for one table resource (DataSet) might look like:
<interface xsi:type="vr:WebBrowser" ref="ivo://CDS/VizieR/WebForm">
<vs:param>
<vs:name>-source</vs:name>
<xsi:value>I/239/hip_main</xsi:value>
<vs:description>Name of the VizieR table to be queried</vs:description>
</vs:param>
</interface>
<interface xsi:type="cs:ConeSearch" ref="ivo://CDS/VizieR/ConeSearchEngine">
<vs:param>
<vs:name>-source</vs:name>
<xsi:value>I/239/hip_main</xsi:value>
<vs:description>Name of the VizieR table to be queried.</vs:description>
</vs:param>
</interface>
<interface xsi:type="vs:ParamHTTP" ref="ivo://CDS/VizieR/TSV">
<vs:param>
<vs:name>-source</vs:name>
<xsi:value>I/239/hip_main</xsi:value>
<vs:description>Name of the VizieR table to be queried.</vs:description>
</vs:param>
<vs:param>
<vs:name>-out.add</vs:name>
<xsi:value>recno</xsi:value>
<vs:description>Add this parameter to the output table.</vs:description>
</vs:param>
<vs:param>
<vs:name>-out.add</vs:name>
<xsi:value>HIP</xsi:value>
<vs:description>Add this parameter to the output table.</vs:description>
</vs:param>
<vs:param>
<vs:name>-out.add</vs:name>
<xsi:value>Proxy</xsi:value>
<vs:description>Add this parameter to the output table.</vs:description>
</vs:param>
</interface>
We currently mainly focussed the application on HTTP queries. And we used the above scheme to dynamically build custom forms, where selecting an access method
(interface) for one table (dataset) displays:
- the generic parameters for the interface
- the additional parameters available for this table and this interface
- default parameter values, and explanations on parameters
Modules and functionalities
Modules
Our tool is written in java and so can be easily structured into modules thanks to the object approach. We decided to cut up our tool in four main modules:
database access,
resources exploration,
resource-to-object conversion and
graphical interface.
- Database access: this module relies on the XMLDB API. Provided an XMLDB eXist driver (it is a java class and is provided by the eXist distribution) this module can access the eXist XML database and so to the registry. It wraps the XMLDB API and makes it easier for the resources explorator module to access data.
- Resources explorator: this module uses the database module to send XPath requests to the registry and retrieve relevant resources. It analyzes the astronomer request on UCDs and transforms it into a valid XPath expression (the Jakarta regexp package is used because java has no native advanced regexp functionalities). It can also make a list of the requested UCDs for future use (see next module).
- Resources-to-object conversion: to manage easily the interaction with the VO resources once retrieved, we decided to imitate it as a java object. Such an object has fields mapped to important metadata of the resource that it represents. This approach allows better performances since once retrieved and imitated as a java object, the VO resource information can be quickly accessed.
- Graphical interface: this module manages the user/tool interaction and is built on the java Swing librairy. It provides a user graphical interface allowing him to retrieve relevant resources and interact with them (see next part for more details about its functionalities).
Functionalities
In this part we will show what can be done today with this SED construction tool, precising if the functionality works with the current VO registry or only with our trial registry.
Find relevant resources thanks to a restriction on its UCDs
Here we want to see if this tool can find all the VO resources having equatorial or galactic positions and radio emission information. It is exactly what Mr Vollmer did by hand in his work
"A method for determining radio continuum spectra and its application to large surveys".
The logical condition on UCDs used to find such resources is:
((pos.eq.ra* && pos.eq.dec*) || (pos.galactic.lon* && pos.galactic.lat*))
&& 4 Mayfield Gardens
(phot.fluxdens;em.radio* || phot.flux;em.radio*)
The first step is to write back the logical condition on UCDs:
figure 1
Then we click on the "Search" button and the tool will search in the registry all the resources corresponding to the request:
figure 2
We obtain the list of the VO identifiers of the relevant resources. As you can see 456 resources have been found. The 22 resources that Mr Vollmer has found in his work are all in this list. This is a good validation for the tool, and a good hint that the tool could efficiently assist experts in the resources discovery task.
Note that this functionality works perfectly with the current VO resource metadata schema (v0.10).
Interact with the obtained resources - data extracting
Note: this functionality relies on our trial registry using the new resources metadata schema described before. These actions cannot be readily done with the current registry.
Once we obtain relevant resources we can interact with them by selecting them:
figure 3
As you can see several buttons appear when selecting a VO resource. The two buttons "XML" and "Description" are always present and allow you to see the whole metadata of the resource (in XML format) or just its description.
The others buttons are generated dynamically according to the available interfaces listed in the VO tabular resource. All information about each interface is totally retrieved from the registry thanks to the new schema discussed in the above part.
Then when clicking on one interface a new window appears allowing the astronomer to fill the parameters to pass to the interface. For example when clicking on the "VizieR Web Interface" button this window appears:
figure 4
It has been found in the registry that this interface has one parameter called "-source" and that the default value "I/239/hip_main" would be appropriate for this tabular VO resource, so the astronomer has in theory nothing else more to do. What's more this interaction with the registry allows the tool to give more information about the role of each parameter like its description. It is done here: when pointing the name of a parameter a tool tip appears showing its description.
When the astronomer has finished to choose the value of each parameter he just has to click on the "Go" button and a browser (the path to it must be given in the configuration file of the tool) is opened with the correct URL calling the VO interface with its parameters. Notice that the URL of the interface is also found in the registry in the corresponding VO resource. Here is the result with this example:
figure 5
Now let's see what happens if we click on the "VizieR TSV catalog" button (on
figure 3):
figure 6
On the left you can fill the parameters of the interface as explained before. The "-out.add" parameter is very interesting: it allows you to add a column identified by its name to the output of the data extraction. To help you finding names of relevant columns the tool remembers the list of the wanted UCDs and makes a mapping between it and a list of parameters having one of these UCDs. Just click on the "columns having requested UCDs" button and you get it. This list of parameters names can help you filling the "-out.add" parameters.