DS6: Data Exploration

Data Exploration = Data Mining + Visualization.

Scope

The aim of DS6 is to complete all technical preparatory work necessary to enable effective data exploration within the future European Virtual Observatory (EuroVO). This will involve assessment of a range of data mining and visualization algorithms and packages, with a view to determining how they can be run as distributed services, how they can be made VObs-compliant and how they can be extended to extremely large datasets. On the assumption that these studies are successful, we will proceed to actual component designs, trial implementations and standards development.

In more detail, elaborating on each of the terms in bold above in turn:

  • Assessment of algorithms and packages. The DS6 team will undertake a Software Survey to summarise the assessment of relevant, existing software, in preparation for its Data Exploration Study Report, due to be delivered in Month 15 of the project.

  • Running tools as distributed services. The DS6 team will prototype the wrapping of existing command line tools using the AstroGrid Common Execution Architecture (or its derived IVOA standard) to expose them as web services which can be run within the developing VObs infrastructure. We shall assess whether some data exploration tasks are sufficiently compute-intensive that they need to be run as compute-grid jobs, and will determine the impact that has on the VObs.

  • VObs-compliance. Existing tools will be made VObs compliant by making them conform to agreed (e.g. IVOA) standards for registry metadata, workflow and use of distributed storage. Conversely, the work of DS6 will exercise existing standards (e.g. VOTable as a transport mechanism for tabular datasets) and may drive the development of new standards within the IVOA framework.

  • Extension to extremely large datasets. In addition to assessing the possible requirements for parallel/distributed computations within the VObs, the DS6 team will study the scalability issues essential to the effective exploration of data sets of large volume and/or high dimensionality in the future EuroVO. This may include the use of specialized data structures (e.g. k-d trees, etc), the deployment of data exploration services directly on databases at the data centre and the generation of visualizations on the server side.

To attain these goals the DS6 team will follow a twin-track approach - the first focussed on short-term goals that will feed directly into software releases of VOTech's early years, and the second addressing harder issues which will only yield solutions over the lifetime of the project. The work of DS6 must be science-driven, feeding on requirements from the VOTech Science Team and also exploiting the scientific motivation and expertise of its own team members.

Participants

Former DS6 Team Members

Stage Plans and Reports

  • Stage 01 (Apr-Sep 2005)
  • Stage 02 (Oct 2005 - Mar 2006)
  • Stage 03 (Apr-Sep 2006)
  • Stage 04 (Oct 2006 - Mar 2007)
  • Stage 04 (Apr 2007 - Sep 2007)
  • Stage 06 (Oct 2007 - Mar 2008)
  • Stage 07 (Apr 2008- Sep 2008)
  • Stage 08 (Oct 2008- Mar 2009)

Meetings and Telecons

Other DS6 wiki pages

See Also...

Topic revision: r45 - 2008-12-03 - 14:36:46 - BobMann
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback