DS6 Stage01 Plan (pre-TAP draft)
Activities
Science requirements and validation
This covers the collection of VO data exploration scenarios from sources like the AVO Science Reference Mission, the VOTECH Science Team and the
EuroVO Science Advisory Group, supplemented by the particular interests of the DS6 team, plus the extraction of requirements from them and the development of a framework for validating software against these requirements. It is intended that one scenario will be adopted as a DS6 demonstrator: it should combine data mining and visualization and should include some iterative/interactive steps.
Assessment of existing third-party tools
This will include identifying which existing third-party tools meet (fully or partially) the scientific requirements of VO data exploration, determining which can be made VO-compliant within reasonable resource limits, and detailing how that should be done. The first step will be to define what VO-compliance means in this context
Development of existing tools
This primarily refers to VisIVO, Astroneural and Aladin. There is on-going work to integrate VisIVO and Astroneural, and discussions underway to plan integration of VisIVO with Aladin and VizieR. In parallel, VisIVO and Astroneural will be made VO-compliant.
Column-ordered storage for data exploration
This activity is studying the feasibility of using column-ordered storage methods to support VO data exploration, both as a general concept and as a modification of the existing STIL/TOPCAT tool. Initial work will make use of HDF5, an existing column-ordered data format.
Griddification of a KDE algorithm
This prototypes the use of k-d trees for VO data exploration, by implementing a Kernel Density Estimation (KDE) code which makes use of
them and can itself be used as the basis for a number of data mining applications. The KDE software is currently available as a command line C code, which will need wrapping as a CEA application. One of the code's steps is an embarrassingly parallelisable parameter value selection, and this will be performed through computational Grid jobs executed by a broker which may later be generalised for other CEA apps.
Participation in IVOA DM/DAL standards definition work
To allow more sophisticated access to, and visualisation of, metadata relating to "data sets", the
IVOA need to complete the definition of the
Characterization
data model, and to provide SIA/SSA/DAL extension mechanisms enabling the handling of complex and heterogeneous data. The immediate goal of the proposed Stage01 work is to develop current ideas to the status of a Working Draft or Proposed Recommendation, and to work on example implementations, based on illustrative science cases.
Design new multi-parameter indexes for large catalogues using k-d trees
The goal here is to assess the use of k-d trees and similar technologies for providing efficient indexing of large catalogues, with VizieR to provide the
testbed for this work.
VO infrastructure advice
Some of the above activities will require apprising DS6 team members new to
EuroVO of details of the
AstroGrid/AVO infrastructure, and advising them on how to integrate their software with it.
Stage01 Report
The final part of the DS6 Stage01 Plan is the preparation of the DS6 Stage01 Report, which will both summarise progress made during Stage01 and outline the general plans that the DS6 team have for their future work.
Deliverables
The deliverables from these activities are tabulated below; due dates will be added later, by agreement with the people involved.
| Description |
Person |
| Collection of VO data exploration scenarios |
BobMann, GiuseppeLongo, Bob Nichol |
| List of third-party tools to asssess |
BobMann and DS6 team |
| Note on what VO-compliance entails for third-party data exploration tools |
MartinHill and JohnTaylor |
| Report on integration plans for VisIVO and Astroneural |
UgoBecciani and GiuseppeLongo |
| Prototype CEA-wrapped broker that will allow the submission and execution of embarrassingly parallel algorithms |
GarrySmith |
| Demonstration of the KDE algorithm executed via the broker |
GarrySmith and Bob Nichol |
| Sample catalogues in HDF5 format |
ClivePage |
| Prototype tool implementing some data manipulation operations on the HDF5 catalogues |
ClivePage |
| Note on inclusion of indexing in HDF5-based tool |
ClivePage |
| Report on possibilities for column-orderring in STIL/TOPCAT |
ClivePage |
| Draft IVOA standards extending SSA/SIA/DAL for heterogeneous data |
FrancoisBonnarel |
| Draft of Characterization data model |
FrancoisBonnarel |
| Report on use of k-d trees for indexing large catalogues |
FrancoisBonnarel |
| DS6 Stage01 Report |
BobMann and DS6 team |
--
BobMann - 11 Mar 2005