Anomaly Detector

This is a web service for the The Auton Lab's fastem algorithm.

What it does

The Anomaly Detector constructs a model of your data, and based on that model determines which objects are the most unusual.

The model is a Gaussian Mixture Model - roughly speaking it fits a number of multidimensional Gaussian distributions to your data. The probability of any particular datapoint is then the weighted sum of the contributions from all the Gaussians in the model.

Usage

The Anomaly Detector requires a table of data and a list of the attributes from that table that will be used to construct the Gaussian Mixture Model. It creates an output table which is a copy of the original with an extra column: the probability of each datapoint according to the model. This table is sorted from lowest probability to highest.

The anomaly detection algorithm has been deployed as a VO-compatible web service and can be invoked using the AstroGrid Workbench.

  • Start your workbench (needs Java)
  • In your workbench, select the Task Launcher (in version v2006.3.rc1 this is under the Workflows tab).
  • In the find box, search for "ivo://wfau.roe.ac.uk/AnomalyDetector" - this is the unique ivorn (registry key) describing this application.
  • Select the application, and choose an interface. In these instructions we assume you selected the Standard interface. Other parameters are available in the Advanced interface and these are described below.
  • Select an input table - this will usually be a VOTable but could be in any format that can be automatically decoded by the STILTS library. Note that by ticking the "Ref?" column you will be prompted to select a table on your local machine, in myspace, or by URL. Enter the attributes that you want the algorithm to use to perform the analysis - these should space-separated column names and in any format understood by stilts such as "3 5" (use the 3rd and 5th columns) or "Bmag Rmag1 Rmag2 4" use the columns with names Bmag, Rmag1, Rmag2 and the 4th column). The quotes should be omitted. Hovering the mouse over any of the parameters will show more information.
  • Select a location for the output table - if you leave this blank, the results will be returned "in-line" (not recommended for large tables).
  • Press the execute button.
  • Result tables can be viewed using any plastic-compatible table viewer such as Topcat.

Parameter details

NSECS

Termination condition: number of seconds.

NSTEPS

Termination condition: number of steps.

Vary Clusters

Cluster Option: Vary the number of clusters.

NClusters

Cluster Option: The number of clusters to use.

Score Type

Scoring Algorithm.

Save PS file

Do you want to save a postscript image file?

Parameter Details

Back to Parameters

Save PS file

Type:string
Default value:false

Do you want to save a postscript image file?

If this is true, then the postscript image will be saved in the file specified by the 'PS File' parameter.

The available values of 'Save PS file' are:

  true  false

Back to Parameters

PS File

Type:string
Default value:em_clusters.ps
Legal values:any non-empty string

File name to save the postscript image to.

For 2D data, specify a filename to save a postscript image of the EM clustering. This option is ignored for higher dimensional data.

Back to Parameters

NSECS

Type:int
Default value:10
Legal values:an integer

Termination condition: number of seconds.

Number of seconds to do the computation. Turn off the time restriction by setting nsecs to -1.

You must use either nsecs or NSTEPS to terminate the EM algorithm. If both conditions are used then the program will terminate on the first violated condition.

Back to Parameters

NSTEPS

Type:int
Default value:-1
Legal values:an integer

Termination condition: number of steps.

Maximum number of steps EM is allowed to run before it terminates. Turn off the step restriction by setting nsteps to -1.

You must use either nsecs or nsteps to terminate the EM algorithm. If both conditions are used then the program will terminate on the first violated condition.

Back to Parameters

Vary Clusters

Type:string
Default value:true

Cluster Option: Vary the number of clusters.

This option allows EM to vary the number of clusters used. If varyclusters is false, the EM algorithm will fix the number of clusters to be either 'nclusters' or the number of loaded centers.

You should specify either 'Vary Clusters' or 'nclusters' but not both.

The available values of 'Vary Clusters' are:

  true  false

Back to Parameters

NClusters

Type:int
Default value:-1
Legal values:an integer

Cluster Option: The number of clusters to use.

Specify the number of clusters the EM algorithm should use. Set to -1 to disable.

You should specify either 'Vary Clusters' or 'nclusters' but not both.

Back to Parameters

scoretype

Type:string
Default value:bic

Scoring Algorithm.

The available algorithms are:

  • aic: Akaike information Criterion
  • bic: Bayesian
  • testset: Divide the data into training/testing sets and evaluate by scoring the testing set.

The available values of 'scoretype' are:


  aic  bic  testset

Back to Parameters

Example Data

Try the algorithm on this example dataset. Warning you must ensure that your data is sufficiently "clean". The algorithm will only work on numeric data and is intolerant of missing or nonnumeric data in the columns being processed.

The calculated Gaussian centres:

Movie

This movie shows the Anomaly Detector in action, launched by the AstroGrid Workbench.

Credits

The fastem algorithm was developed and coded by the Auton Lab and deployed to the VO using AstroGrid software. It is hosted by the Wide Field Astronomy Unit of the University of Edinburgh. Thanks to Andrew Connolly at the University of Pittsburgh for his help in deploying this service.
Topic attachments
I Attachment Action Size Date Who Comment
jpgjpg anomalies.jpg manage 69.6 K 2007-08-03 - 16:19 JohnTaylor  
elsevot example.vot manage 311.8 K 2006-08-25 - 19:41 JohnTaylor Example VOTable
gifgif gaussians.gif manage 34.4 K 2007-08-07 - 15:34 JohnTaylor Gaussians
elsefits mgc.fits manage 5715.0 K 2006-09-07 - 12:15 JohnTaylor Test Table
elseodp plasticSorrento.odp manage 691.2 K 2006-03-21 - 17:51 JohnTaylor  
pdfpdf plasticSorrentoNoAn.pdf manage 623.7 K 2006-03-21 - 17:51 JohnTaylor  
pptppt plasticSorrentoNoAn.ppt manage 763.5 K 2006-03-21 - 17:51 JohnTaylor  
xmlxml twocol.xml manage 605.7 K 2007-08-03 - 16:19 JohnTaylor SDSS colour data
jpgjpg visivo.jpg manage 32.6 K 2006-09-01 - 16:48 JohnTaylor Anomalies in VisIVO
gifgif weirdos.gif manage 22.6 K 2006-09-01 - 11:37 JohnTaylor results from AD in Topcat
Topic revision: r22 - 2007-08-07 - 16:02:54 - JohnTaylor
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback