Anomaly Detector
This is a web service for the
The Auton Lab's
fastem algorithm.
What it does
The Anomaly Detector constructs a model of your data, and based on that model determines which objects are the most unusual.
The model is a
Gaussian Mixture Model - roughly speaking it fits a number of multidimensional Gaussian distributions to your data. The probability of any particular datapoint is then the weighted sum of the contributions from all the Gaussians in the model.
Usage
The Anomaly Detector requires a table of data and a list of the attributes from that table that will be used to construct the Gaussian Mixture Model. It creates an output table which is a copy of the original with an extra column: the probability of each datapoint according to the model. This table is sorted from lowest probability to highest.
The anomaly detection algorithm has been deployed as a VO-compatible web service and can be invoked using the
AstroGrid Workbench.
- Start your workbench (needs Java)
- In your workbench, select the Task Launcher (in version v2006.3.rc1 this is under the Workflows tab).
- In the find box, search for "ivo://wfau.roe.ac.uk/AnomalyDetector" - this is the unique ivorn (registry key) describing this application.
- Select the application, and choose an interface. In these instructions we assume you selected the Standard interface. Other parameters are available in the Advanced interface and these are described below.
- Select an input table - this will usually be a VOTable but could be in any format that can be automatically decoded by the STILTS library. Note that by ticking the "Ref?" column you will be prompted to select a table on your local machine, in myspace, or by URL. Enter the attributes that you want the algorithm to use to perform the analysis - these should space-separated column names and in any format understood by stilts such as "3 5" (use the 3rd and 5th columns) or "Bmag Rmag1 Rmag2 4" use the columns with names Bmag, Rmag1, Rmag2 and the 4th column). The quotes should be omitted. Hovering the mouse over any of the parameters will show more information.
- Select a location for the output table - if you leave this blank, the results will be returned "in-line" (not recommended for large tables).
- Press the execute button.
- Result tables can be viewed using any plastic-compatible table viewer such as Topcat.
Parameter details
| NSECS |
Termination condition: number of seconds.
|
| NSTEPS |
Termination condition: number of steps.
|
| Vary Clusters |
Cluster Option: Vary the number of clusters.
|
| NClusters |
Cluster Option: The number of clusters to use.
|
| Score Type |
Scoring Algorithm.
|
| Save PS file |
Do you want to save a postscript image file?
|
Parameter Details
Back to Parameters
Save PS file |
| Type: | string |
| Default value: | false |
|
|
Do you want to save a postscript image file?
If this is true, then the postscript image will be saved in the file specified by the 'PS File' parameter.
The available values of 'Save PS file' are:
true false
|
Back to Parameters
PS File |
| Type: | string |
| Default value: | em_clusters.ps |
| Legal values: | any non-empty string |
|
|
File name to save the postscript image to.
For 2D data, specify a filename to save a postscript image of the EM clustering. This option is ignored for higher dimensional data.
|
Back to Parameters
NSECS |
| Type: | int |
| Default value: | 10 |
| Legal values: | an integer |
|
|
Termination condition: number of seconds.
Number of seconds to do the computation. Turn off the time restriction by setting nsecs to -1.
You must use either nsecs or NSTEPS to terminate the EM algorithm. If both conditions are used then the program will terminate on the first violated condition.
|
Back to Parameters
NSTEPS |
| Type: | int |
| Default value: | -1 |
| Legal values: | an integer |
|
|
Termination condition: number of steps.
Maximum number of steps EM is allowed to run before it terminates. Turn off the step restriction by setting nsteps to -1.
You must use either nsecs or nsteps to terminate the EM algorithm. If both conditions are used then the program will terminate on the first violated condition.
|
Back to Parameters
Vary Clusters |
| Type: | string |
| Default value: | true |
|
|
Cluster Option: Vary the number of clusters.
This option allows EM to vary the number of clusters used. If varyclusters is false, the EM algorithm will fix the number of clusters to be either 'nclusters' or the number of loaded centers.
You should specify either 'Vary Clusters' or 'nclusters' but not both.
The available values of 'Vary Clusters' are:
true false
|
Back to Parameters
NClusters |
| Type: | int |
| Default value: | -1 |
| Legal values: | an integer |
|
|
Cluster Option: The number of clusters to use.
Specify the number of clusters the EM algorithm should
use. Set to -1 to disable.
You should specify either 'Vary Clusters' or 'nclusters' but not both.
|
Back to Parameters
scoretype |
| Type: | string |
| Default value: | bic |
|
|
Scoring Algorithm.
The available algorithms are:
- aic: Akaike information Criterion
- bic: Bayesian
- testset: Divide the data into training/testing sets and evaluate by scoring the testing set.
The available values of 'scoretype' are:
aic bic testset
|
Back to Parameters
Example Data
Try the algorithm on this example dataset. Warning you must ensure that your data is sufficiently "clean". The algorithm will only work on numeric data and is intolerant of missing or nonnumeric data in the columns being processed.
The calculated Gaussian centres:
Movie
This movie shows the Anomaly Detector in action, launched by the AstroGrid Workbench.
Credits
The fastem algorithm was developed and coded by the Auton Lab and
deployed to the VO using
AstroGrid software. It is hosted by the Wide Field Astronomy Unit of
the University of Edinburgh.
Thanks to Andrew Connolly at the University of Pittsburgh for his help in deploying this service.
Topic revision: r22 - 2007-08-07 - 16:02:54 -
JohnTaylor

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback