A rather dull worKLOG. This is just a scratchpad for solutions to IT problems that might be useful to someone else. Expect no opinions, no brilliant insights and definitely no pictures of pets or children. Expect stack traces, code snippets and other hints for the Google Indexer.

Monday, August 21, 2006

Creating the Anomaly Detector CEC

I'm installing Andy Connolly's Expectation Maximization/Anomaly Detection algorithm into an AstroGrid CEC (Common Execution Connector) to make it available to the VO. Along the way I've been using the splendid STILTS utility to do all the VOTable conversions.
Things I've learned along the way
  • STILTS is sufficient to take a table in assorted different formats, and extract a user-defined set of columns (specified by position or name), and export the results as an EM-friendly space-separated text file. It is also smart enough to take the generated list of row numbers and p_values and do an exact cross match on the original table...thus generating a clone of the original table, sorted by p_value and with the p_values attached as an extra column. In short, it does all the table manipulation needed to transform a VO-standard table into a proprietary format and back again, with only a little help from awk.
  • Any processes started by the CEC must NOT ask the user for input. The current EM-algorithm prompts the user for input if it fails for some reason...this seriously screws up your CEC. The solution is to add a test on the return value from the STILTS preprocessing step, before the EM-algorith is called:
    • #echo $?
      #Try to protect fastem from bad data
      if [ "$?" -ne "0" ]; then
      echo "STILTS failed to prepare data. Aborting"
      exit 1
      fi

  • You can only specify a registry template for a single CEA application. This means that if you're CEC supports several applications, you need to edit the registry description by hand after it's been generated.
  • You can specify an optional argument in the CEC-CL config with (insert)
  • The CEC-CL only supports the following kinds of parameters:
    • -key value
    • positional
    • key=value
  • Unfortunately the em algorithm takes parameters as
    • key value
  • You can get round this by pattern matching on the params with something like ${@//=/ } which takes all the parameters in one go and replaces all = with ' '