Eirik: a how to

Note: this page is a useful introduction, but has been superceded by this page.
Please check out the docs/sources etc there first smile

Intro

Eirik (the Red) was the stuff of legend. Evicted from Norway and Iceland for various disputes ("some killings" in the texts) he set out across the North Atlantic in a fine Viking longship (in reality, little more than a big row-boat with a sail) with his family and a few provisions, but was able a found a settlement in Greenland. His son, apparently, made it to America, starting another legend about a Vinland....

Anyhow, in wandering through a large data set, the explorer may well perceive a danger of getting lost and washed up in some distant place where the locals are unfriendly (especially to Vikings). Eirik is therefore a tool for data navigation, the aim of which is to promote a directed search which is always from the more simple to the more complex eg 1D -> 2D -> ... or O(n) -> O(n^2) -> ... etc

Why do things like this? There are many visualization tools out there (see summary at DS6SoftwareSurvey) and many have excellent qualities, but a common failing is to introduce complex algorithms (eg. clustering) or visualization modalities (eg. volume rendering, high dimensional scatterplots) early on in the analysis. Even for modest data sets this can result in a heavy information load, with long processing delays or freezing of the application. Instead, Eirik aims to encourage a more cyclic form of interaction which allows time to think before rushing in.

Currently the main tool behind Eirik is R, but this is embedded in a GUI which comprises various graphical tools. The first (and possibly main) tool employs Tufte's notion of sparklines, which are "intense, simple, word-sized graphics". The objective is to try and squash a large amount of information into a very small space with minimum use of 'ink' (or pixels, in this case). This page and the demo describes the use of this window in more detail.

Installing the demo

You can grab a linux binary (built on Fedora Core 5) from here or a win32 binary (built on XP using mingw) in this bzipped archive . On windows, you can use the free 7-zip (http://www.7-zip.org) to extract the bundle (Winrar/WinZip also work fine).

Requirements: R and Java

The demo binaries are statically linked against QT and a bunch of other stuff, so a recent linux distribution or Windows XP with an R installation (v2.3.1 used in building) and Java (v1.4 or above) should do (see the included INSTALL files). In the latest version, R and Java should be detected automatically so that by typing or clicking the scripts eirik.sh (linux) or eirik.bat (win32) should launch things automatically. Let me know if these scripts need tweaking...

An example data file is given in the _Eirik_/tmp directory.

Starting up and running

The opening display

The opening display uses any UCDs contained in the example file to create something like a file tree, but of variables rather than files (if UCDs are not available, a simple list is generated). The aim is give a digested summary of the available data from the votable header and it should be noted that no data have been loaded at this point.


tree-view.jpg

By clicking on the nodes of the tree, the investigator can 'open' interesting regions of the data set whilst ignoring things which are less relevant at this stage (such as error terms). Up to ten items can be selected by click-and-dragging the mouse across opened nodes, or by ctrl-clicks on individual items (only the 'leaves' of the tree can be selected). Press the "get data" button when you want to load this data.

Tip: Right-clicking on the tree tab (or anywhere above the tree display) expands all the nodes of the tree; a middle button click will then collapse the tree.

The graph display

The "graph" button is activated when the data are loaded and pressing this button gives two displays: a series of sparkline histograms in a tab of the main window and a separate smaller opengl window, titled "Eirik plot". There are currently two simple forms of interaction with the plot window: zooming/panning and region selection.


plot-window.jpg

  • Zooming and panning: Scrolling the mouse wheel or pressing the middle button and pushing forwards/pulling forwards results in a zoom operation
  • Pressing the right mouse button and moving pans about plot display. This works best if you move the region of interest to the centre of the display and zooming then works 'aounnd' this region.
  • The camera, and up/down buttons allow snapshots of a particular region to be stored, so you can move around from view to view. The clear button removes all saved views back to the statring view. (Note these features are really intended for future 3D plotting of data, and are not really that useful for the demo.)

  • Region selection: Clicking and holding the left mouse button selects regions:-
  • Left-to-right: chooses a region in one data set. The corresponding data items are then shown in the sparkline histogram view. This then gives an impression of how data are correlated between the views (forming a histogram equivalent of parallel coordinate displays).

* Select data cross-histogram comparison:
select-data-view.jpg

  • Right-to-left: shows the "position" of null data, where the data are non-null in the other data columns (shown as a black 'shadow').

* Null data cross-histogram comparison:
null-data-view.jpg

  • Single click/release: clears all the displays.

Different variables can be selected in the plot window by using the drop-down "combo" box. As noted, the sparklines show the distribution of the selected data in all the chosen variables. On the right-hand side of the display, a simple density estimate is also given to compare with the histograms. (Density estimates improve on histograms in many ways, because they do not suffer from the same discontinuity problems eg. due to the choice of bin size). R allows more sophisticated density modelling, it is expected that this will be better supported in future.

The R tab

Once data have been selected and loaded, it's possible to call regular R commands in this display (in the execute box). There is one main catch: since there is no main R event loop running, graphics devices opened in this way will not refresh/run as expected. However, you can plot stuff to postscript, png or jpeg (using the usual commands, which I shan't go into here). This window was intended to allow further exploration by R users and for debugging, but a good deal more could be added ... requests welcome smile

The MI/Cov tab

Also activated when data are loaded is the MI/Cov page:


mi-cov.jpg

Opening this tab shows a blank page at first: you have to press the "get data" button before the actual mutual information (MI), correlation and covariance values are computed. This is very much how Eirik is written, since these quantities are very much 2-dimensional and therefore more complex to compute. In fact, for MI an approximate technique is used , based on Carsten Daub's spline-based algorithm.

Pressing the button then should display a levelplot (based on the R lattice library) which, by making selections on the combo boxes beneath the plot can be used to display MI/Correlation/Covariance on either or both sides of the plot diagonal. Note that the scale ignores data which would normally lie on the diagonal and that values for the minimum and maximum are only displayed when the same (symmetric) plot appears on both sides.

Clicking with the left mouse button in the plot highlights that particular square and its symmetric counterpart (and the left check buttons will be selected; right clicking removes this highlighting). In (the near) future, this will be used to guide the user towards other 2D/3D plotting techniques (eg scatterplots) to cut down the search space. (Choosing any two axes from ten, for example, implies a search space of some 45 scatterplots.)

PLASTIC/xmlrpc

A limited selection of plastic compatible instructions have been implemented and are available if the plastic hub is running and the register button is pressed. One operation this currently allows is to send tables to Eirik, eg. from Topcat.

ASR/ACR

Using similar xmlrpc instructions to that used for Plastic, it's possible to communicate directly with the ACR. In Eirik, this is most visibly useful in the "Load from ACR" menu item (under File). Choosing this should bring up the Astrogrid login dialog and allow you to download a votable from there.

-- RichardHolbrey - 02 Oct 2006

(1)The source for this is available as a bzipped tar, but be warned: there a quite a number of dependencies and QTv4 alone requires well over an hour to compile and several GB of hard disk space.

Topic attachments
I Attachment Action Size Date Who Comment
elsebz2 eirik-linux.tar.bz2 manage 9621.9 K 2006-10-02 - 17:39 RichardHolbrey Linux binary of eirik
elsebz2 eirik-win.tar.bz2 manage 10126.3 K 2006-10-02 - 09:12 RichardHolbrey  
elsebz2 eirik_source.tar.bz2 manage 297.9 K 2006-10-02 - 18:05 RichardHolbrey source tar
jpgjpg mi-cov.jpg manage 38.4 K 2006-08-30 - 21:44 RichardHolbrey mi-cov page
jpgjpg null-data-view.jpg manage 12.3 K 2006-08-10 - 18:27 RichardHolbrey Null data cross-histogram comparison
jpgjpg plot-window.jpg manage 54.6 K 2006-08-10 - 18:16 RichardHolbrey The Eirik plot window with zoom controls
jpgjpg select-data-view.jpg manage 13.8 K 2006-08-10 - 18:23 RichardHolbrey Select data cross-histogram comparison
jpgjpg tree-view.jpg manage 36.5 K 2006-08-10 - 18:07 RichardHolbrey Initial 'tree' header view
Topic revision: r17 - 2006-12-08 - 16:05:11 - RichardHolbrey
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback