MJI's notes mainly in addition to WFAU's, with my comments in [] and !'s For internal consumption only. Astrowise mission is a Virtual Survey System --> OmegaCEN seems to be aiming to be CASU + WFAU + Astrogrid all rolled into one with extra functionality in some aspects as well. My impression was that it was closest in aspirations to Astrogrid. It is highly abstracted, all OO'd, built around Oracle 9i and Python, and the DB controls everything ! It aims to [will] provide [still not complete and STILL cannot produce a CMD ! will do next year, when I asked] everything. I. Overview User interface is via Python I/F to SQL [Astronomers have to learn Python instead of SQL, +'s and -'s] -o- environment to access raw and calibrated data execute and modify image and calibration pipelines archiving and allows dynamic regeneration federated to other data centres -o- dynamic archive continuously grows keeps everything can cope with small or large science projects [not clear if surveys are catered for] no specific versions of data products ! [is there any version control at all ?] users can reprocess on demand ! and upload own code to do same ! [haven't thought much about system load restrictions !] exchange methods and scripts and configurations summary: raw pixels --> pipelines, calibration files --> catalogues --> federation II. Observing Protocols Observing procedures --> OBs and calibration want them to be tightly specified BUT Astrowise have no way to link ESO OBs to carry out sequenced observing procedures. Have no way to carry out a survey either. Have very bad relationship with ESO. Pipeline is predicated on having linked OBs and will not work without. [Do not have FITS header driven pipeline processing philosophy - bad!] Odd idea to use composite 4 quadrant u,g,r,i filter to speed up stds [can't see the point and what about edge effects ?] [Also seem parnaoid about scattered light causing systematics in photometry - obviously having trouble with ESO WFI data] III. Pipeline Components -o- image processing is via Eclipse library (in c) written by N. Devillard at ESO a while back -o- wrapped with SWIG [whatever that is] -o- coaddition uses Bertin's SWARP package via system level calls [ie. not incorporated properly] -o- Sextractor [as usual] -o- uses LDAC for astrometry and photometry -o- common interfaces -o- persistent source lists they view pipelines as data administration rather than processing since the whole lot is DB driven [note they have not written hardly any of what we would call pipeline processing software other than LDAC (Eric Deul, Leiden)] -o- very keen for PSF homogenisation for photometric redshifts -o- very worried about fringing, particularly when reminded that VISTA would be far superior at doing z-band surveys [if it gets a z-band filter !] IV. Astrometric Calibration -o- various problems/issues to deal with refraction, mis-centring, misalignment of optical axis -o- use a variant of Patterson function to get shifts and the triangle scheme to get rotation and scale [get the distinct impression too complicated and can't see the wood for the trees] -o- keen on overlap solutions, [global detector solution is a possibility they may have thought of unclear when raised] -o- keep track of astrometric fit params for DQC V. Processing Power Have servers, that via DB, schedule CPU jobs by polling known systems to see if have spare capacity do a lot of interprocess communication mainly using Gbit ehternet [I think] haven't heard of Rice tile compression [v. interested in that] use a lot of 32-node Beowulf clusters (32 CCDs in OmegaCam, philosophy is 1 node per CCD - implies a fair bit of interprocessor communication ============================================================================= Nigel Hambly's notes......

Visit to the ASTRO-WISE team at OmegaCEN, Groningen, NL

Introduction

NCH, IAB and ETWS visited the ASTRO-WISE (hereafter AW) team at the Kapteyn Astronomical Institute in Groningen, Holland on 3/4 December 2003. Also attending was MJI from CASU. AW personnel that were present some or all of the time where: Ed Valentijn, Danny Boxhoorn, Erik Deul, Roland Vermeij, Michiel Tempelaar and several others whose names I didn't catch. Details of all AW personnel and their function can be found at the AW website under `persons' (click on NOVA-OmegaCEN for the Groningen contingent).

AW is primarily a pipeline processing environment written in highly generalised, abstracted Python. The interactive environment is basically interactive Python, and the system has many advanced features such as distributed processing, object-oriented design and an object-relational use of the O-R DBMS Oracle. At the moment, there is very little design for science archiving of specific survey programmes (ie. along the lines of the WFCAM Science Archive). To be sure, pipeline processing and storage of processing operations and image metadata are well handled by the system, but there is currently no provision for general user access to prepared catalogue products, or any other of the science-enabling features that we are planning for the WSA. This came as quite a surprise. Physically, AW is distributed over a federation of processing centres, each having data processing hardware and each running an instance of Oracle.

Wednesday December 3rd

We started off with a working lunch with Ed in which he and I both described, inbroad terms, what we are doing. We then discussed some preliminaries:

AW take an OO approach for great generality and flexibility;
AW aims to process and track many varied survey datasets, both large and small programmes;
AW have the same relationship to the VO as the WSA: they aspire to providing large datasets to the VO;
AW emphasis is totally different to the WSA, and highly ambitious (eg. more akin to AstrGrid).

On that last point, for example AW recognise that `schema evolution' is seen as a major system requirement, yet RDBMSs do not easily implement this. Hence, even the ambitious AW have had to fix as much as possible in their design to make progress.

Overview

Ed gave an overview of AW. OmegaCEN is the administrative hub of AW, and is a NOVA funded centre specialising in wide field imaging. He described AW as a virtual survey system providing a systematic, controlled environment including a dynamic, federated archive of raw and processed data and associated metadata. The philosophy is to handle the usual dataflow system of images/pipelines/calibrations/catalogues within the system. Ed noted that the system is generalised to cope with large or small observation programmes; the idea is reduction/extraction on the fly or extraction of pre-processed data which persists within the archive. All data processing is fundamentally database driven. The AW approach is to proceduralise processing and enforce strict observing protocols - they perceive pipeline processing as an administrative problem.

Some interesting factoids emerged during this overview: for example, the AW implementation at ESO will not use the Python glue layer because ESO have banned Python. There is clearly some tension between AW and ESO-DMD. The Paranal/Garching-side pipelines will consist of AW applications stripped out of Python. AW indicated that they intend to redo everything that Garching does anyway, so it seems that ESO/Garching will not be a part of the AW federation proper. Also, the distinction between data distribution and/or mirroring within the AW federation is unclear: it seems that AW intend to provide a flexible infrastructure that can handle either, and it is up to the user to decide what level of each to employ.

DBMS

AW are currently running Oracle-9i and have a licensing/consultancy deal that costs them of the order (this was actually quite unclear) of 10,000 Euros for the 4 year development cycle. The DBAs stated that they have found the consultancy of limited help in their work, and compared to our experience with MS SQL Server and Jim Gray, are not as well off.

Hardware

Each hub in the federation runs a 32-node Beowulf PC cluster and a dataserver (not RAC!) for the instance of Oracle. WAN with 1 Gb/s networks connects all this. Datasevers utilise network attached storage using 3Ware Escalade hardware RAID controllers along with ATA-133 IDE disks. Within the federation, they quote 5 Mbyte/s connectivity; at each hub the data retrieval is quoted as being 200 Mbyte/s for local IO.

Off-the-shelf components

Pixel engine: Eclipse, and old ESO warhorse.
SWARP from Terapix
SExtractor from Terapix/Bertin
XML RPC Python modules (SOAPpy)
HTM C++ from JHU.
Oracle (including Oracle's own built-in DB API)

Much of the software has been written by themselves - they do not use, for example, Condor, third-party middleware Python components, eg. DB interfaces, ...

Contents of the AW federation

The AW federation consists of observed images, source lists and calibrations. User interfaces are interactive-command line; GUIs are planned for `standard' features like running of a standard pipeline suite. It's all quite low-level stuff compared to our user interfaces, eg. you can "check out your own copy of the data reduction software, modify it in anyway you like, then specify it is to be used when you run some data processing procedures in the environment. There is very little push-button/GUI functionality in AW, and it does not yet have any provision for higher level data mining type applications.

Some other techno-geekery

The AW pipeline is written in highly abstracted, object-oriented Python. FITS images/headers are built in data types; Python classes are used to represent DB tables. To see examples of the extreme levels of abstraction, see code examples in the CVS. The persistent object layer in the AW environment is very impressive - AstroGrid should look at this if they haven't already.

Wednesday finished up with a basic demonstration of the AW pipeline, which looked just like reducing data in IRAF, IDL, ... only in an object-oriented way.

Thursday December 4th

Parallel processing

Erik Deul gave an interesting presentation on the distributed processing capabilities of AW. This is not truly parallel processing, since multi-device imaging data are simply farmed out to available PC nodes in a cluster (ie. `embarrassing' parallelism). AW bind data to a particular PC node and operate on it there, results are then assembled when all is done bie 1 Gb/s interconnect. AW do not use CFITSIO RICE compression, and when MJI described this to them, they seemed very interested in looking into it. AW have developed their own home-grown interprocessor communications using XML RPC in Python; they do not use WSDL, Condor. They have coded up bespoke Python methods for easy deployment in a grid-like architecture, and have even developed their own uname/passwd authentication layer for security. They have developed a scheduler in Python that receives client requests, analyses available servers/costs and allocates jobs via a stack queue; processing servers can of course be anywhere in the AW federation. Hence, AW is a kind of mini-VO environment in which it will be easy to implement VO cone-searches, SIA, ... but this has not been done yet.

For very heavy processing, AW does not intend to distribute over the federation, rather they will keep within the confines of the Beowulf local to the data. For peer-to-peer communications of large datasets, XML comms is of course not used; again, bespoke peer-to-peer services make data available to all other servers in the local compute cluster via a WEB-based interface (eg. that is wget enabled). Python classes exist for storage, with store/retrieve methods. Data are cached locally for optimum performance.

WSA presentation

I gave my usual presentation (based on the talk I gave at NeSC at Bob's `VO as a datagrid' meeting) of the WSA, including the SSA demo. The AW team seemed genuinley interested in our experience with MS SQL Server and also our approach to user interfaces.

Summary

Presently, it seems that AW has more in common with pipeline processing developments at CASU and with distributed functionality along the lines being developed by AstroGrid. Major points of interest were:

AW has 1.0 FTE Oracle DBA in the form of Danny at OmegaCEN along with a junior shadowing him and 0.2 FTE DBAs at each of the various nodes in the AW federation.
They got nowhere with trying to use existing spatial indexing schemes in Oracle and ended up implementing HTM from the JHU C++ libraries
AW currently has little potential as an easy catalogue data-mining resource

Useful web links

AW website

AW CVS repository

-- NigelHambly - 03 Dec 2003

Topic ASTROWISE2003Report . { Edit | Attach | Ref-By | Printable | Diffs | r1.2 | > | r1.1 | More }

Revision r1.2 - 05 Dec 2003 - 16:28 GMT - IanBond
Parents: InformalReports Copyright © 2001 by the contributing authors. All material on this collaboration tool is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback.

Revision r1.2 - 05 Dec 2003 - 16:28 GMT - IanBond Parents: InformalReports	Copyright © 2001 by the contributing authors. All material on this collaboration tool is the property of the contributing authors. Ideas, requests, problems regarding TWiki? Send feedback.