MJI's notes mainly in addition to WFAU's, with my comments in [] and !'s
For internal consumption only.
Astrowise mission is a Virtual Survey System --> OmegaCEN
seems to be aiming to be CASU + WFAU + Astrogrid all rolled into one
with extra functionality in some aspects as well. My impression was
that it was closest in aspirations to Astrogrid.
It is highly abstracted, all OO'd, built around Oracle 9i and Python,
and the DB controls everything !
It aims to [will] provide [still not complete and STILL cannot produce a CMD !
will do next year, when I asked] everything.
I. Overview
User interface is via Python I/F to SQL
[Astronomers have to learn Python instead of SQL, +'s and -'s]
-o- environment
to access raw and calibrated data
execute and modify image and calibration pipelines
archiving and allows dynamic regeneration
federated to other data centres
-o- dynamic archive
continuously grows keeps everything
can cope with small or large science projects
[not clear if surveys are catered for]
no specific versions of data products !
[is there any version control at all ?]
users can reprocess on demand ! and upload own code to do same !
[haven't thought much about system load restrictions !]
exchange methods and scripts and configurations
summary: raw pixels --> pipelines, calibration files --> catalogues
--> federation
II. Observing Protocols
Observing procedures --> OBs and calibration want them to be tightly
specified
BUT Astrowise have no way to link ESO OBs to carry out sequenced observing
procedures.
Have no way to carry out a survey either.
Have very bad relationship with ESO.
Pipeline is predicated on having linked OBs and will not work without.
[Do not have FITS header driven pipeline processing philosophy - bad!]
Odd idea to use composite 4 quadrant u,g,r,i filter to speed up stds
[can't see the point and what about edge effects ?]
[Also seem parnaoid about scattered light causing systematics in photometry
- obviously having trouble with ESO WFI data]
III. Pipeline Components
-o- image processing is via Eclipse library (in c) written by N. Devillard
at ESO a while back
-o- wrapped with SWIG [whatever that is]
-o- coaddition uses Bertin's SWARP package via system level calls
[ie. not incorporated properly]
-o- Sextractor [as usual]
-o- uses LDAC for astrometry and photometry
-o- common interfaces
-o- persistent source lists
they view pipelines as data administration rather than processing since the
whole lot is DB driven
[note they have not written hardly any of what we would call pipeline
processing software other than LDAC (Eric Deul, Leiden)]
-o- very keen for PSF homogenisation for photometric redshifts
-o- very worried about fringing, particularly when reminded that
VISTA would be far superior at doing z-band surveys
[if it gets a z-band filter !]
IV. Astrometric Calibration
-o- various problems/issues to deal with
refraction, mis-centring, misalignment of optical axis
-o- use a variant of Patterson function to get shifts
and the triangle scheme to get rotation and scale
[get the distinct impression too complicated and can't see the wood
for the trees]
-o- keen on overlap solutions,
[global detector solution is a possibility they may have thought of
unclear when raised]
-o- keep track of astrometric fit params for DQC
V. Processing Power
Have servers, that via DB, schedule CPU jobs by polling known systems to see
if have spare capacity
do a lot of interprocess communication mainly using Gbit ehternet [I think]
haven't heard of Rice tile compression [v. interested in that]
use a lot of 32-node Beowulf clusters (32 CCDs in OmegaCam, philosophy
is 1 node per CCD - implies a fair bit of interprocessor communication
=============================================================================
Nigel Hambly's notes......
NCH, IAB and ETWS visited the ASTRO-WISE (hereafter AW)
team at the Kapteyn Astronomical Institute in
Groningen, Holland on 3/4 December 2003.
Also attending was MJI from CASU. AW personnel that were present
some or all of the time where: Ed Valentijn, Danny Boxhoorn, Erik Deul,
Roland Vermeij, Michiel Tempelaar and several others whose names I
didn't catch. Details of all AW personnel and their function can be
found at the AW website under
`persons' (click on NOVA-OmegaCEN for the Groningen contingent).
AW is primarily a pipeline processing environment written in highly
generalised, abstracted Python. The interactive environment is
basically interactive Python, and the system has many advanced
features such as distributed processing, object-oriented design
and an object-relational use of the O-R DBMS Oracle. At the moment,
there is very little design for science archiving of specific
survey programmes (ie. along the lines of the WFCAM Science Archive).
To be sure, pipeline processing and storage of processing operations
and image metadata are well handled by the system, but there is
currently no provision for general user access to prepared catalogue
products, or any other of the science-enabling features that we
are planning for the WSA. This came as quite a surprise. Physically,
AW is distributed over a federation of processing centres, each having
data processing hardware and each running an instance of Oracle.
We started off with a working lunch with Ed in which he and I
both described, inbroad terms, what we are doing. We then
discussed some preliminaries:
- AW take an OO approach for great generality and flexibility;
- AW aims to process and track many varied survey datasets, both large and small programmes;
- AW have the same relationship to the VO as the WSA: they aspire to providing large datasets to the VO;
- AW emphasis is totally different to the WSA, and highly ambitious (eg. more akin to AstrGrid).
On that last point, for example AW recognise that `schema evolution' is
seen as a major system requirement, yet RDBMSs do not easily implement this.
Hence, even the ambitious AW have had to fix as much as possible in their
design to make progress.
Overview
Ed gave an overview of AW. OmegaCEN is the administrative hub of AW,
and is a NOVA funded centre specialising in wide field imaging. He
described AW as a virtual survey system providing a systematic, controlled
environment including a dynamic, federated archive of raw and processed
data and associated metadata. The philosophy is to handle the usual
dataflow system of images/pipelines/calibrations/catalogues within the
system. Ed noted that the system is generalised to cope with large or
small observation programmes; the idea is reduction/extraction on
the fly or extraction of pre-processed data which persists within the
archive. All data processing is fundamentally database driven. The AW
approach is to proceduralise processing and enforce strict observing
protocols - they perceive pipeline processing as an administrative
problem.
Some interesting factoids emerged during this overview: for example,
the AW implementation at ESO will not use the Python glue layer
because ESO have banned Python. There is clearly some tension
between AW and ESO-DMD. The Paranal/Garching-side pipelines will
consist of AW applications stripped out of Python. AW indicated that
they intend to redo everything that Garching does anyway, so it seems
that ESO/Garching will not be a part of the AW federation proper.
Also, the distinction between data distribution and/or mirroring
within the AW federation is unclear: it seems that AW intend to provide
a flexible infrastructure that can handle either, and it is up to the
user to decide what level of each to employ.
DBMS
AW are currently running Oracle-9i and have a licensing/consultancy
deal that costs them of the order (this was actually quite unclear)
of 10,000 Euros for the 4 year development cycle. The DBAs stated
that they have found the consultancy of limited help in their
work, and compared to our experience with MS SQL Server and Jim
Gray, are not as well off.
Hardware
Each hub in the federation runs a 32-node Beowulf PC cluster and
a dataserver (not RAC!) for the instance of Oracle. WAN with
1 Gb/s networks connects all this. Datasevers utilise network
attached storage using 3Ware Escalade hardware RAID controllers
along with ATA-133 IDE disks. Within the federation, they quote
5 Mbyte/s connectivity; at each hub the data retrieval is
quoted as being 200 Mbyte/s for local IO.
Off-the-shelf components
- Pixel engine: Eclipse, and old ESO warhorse.
- SWARP from Terapix
- SExtractor from Terapix/Bertin
- XML RPC Python modules (SOAPpy)
- HTM C++ from JHU.
- Oracle (including Oracle's own built-in DB API)
Much of the software has been written by themselves - they do not
use, for example, Condor, third-party middleware Python components,
eg. DB interfaces, ...
Contents of the AW federation
The AW federation consists of observed images, source lists and
calibrations. User interfaces are interactive-command line;
GUIs are planned for `standard' features like running of a
standard pipeline suite. It's all quite low-level stuff
compared to our user interfaces, eg. you can "check out your
own copy of the data reduction software, modify it in anyway
you like, then specify it is to be used when you run some
data processing procedures in the environment. There is very
little push-button/GUI functionality in AW, and it does not
yet have any provision for higher level data mining type
applications.
Some other techno-geekery
The AW pipeline is written in highly abstracted, object-oriented
Python. FITS images/headers are built in data types; Python
classes are used to represent DB tables. To see examples of the
extreme levels of abstraction, see code examples in the
CVS. The persistent object layer
in the AW environment is very impressive - AstroGrid should
look at this if they haven't already.
Wednesday finished up with a basic demonstration of the AW pipeline,
which looked just like reducing data in IRAF, IDL, ... only in
an object-oriented way.
Parallel processing
Erik Deul gave an interesting presentation on the distributed
processing capabilities of AW. This is not truly parallel processing,
since multi-device imaging data are simply farmed out to available
PC nodes in a cluster (ie. `embarrassing' parallelism). AW bind data
to a particular PC node and operate on it there, results are then
assembled when all is done bie 1 Gb/s interconnect. AW do not use
CFITSIO RICE compression, and when MJI described this to them, they
seemed very interested in looking into it. AW have developed their
own home-grown interprocessor communications using XML RPC in
Python; they do not use WSDL, Condor. They have coded
up bespoke Python methods for easy deployment in a grid-like
architecture, and have even developed their own uname/passwd
authentication layer for security. They have developed a scheduler
in Python that receives client requests, analyses available
servers/costs and allocates jobs via a stack queue; processing
servers can of course be anywhere in the AW federation. Hence,
AW is a kind of mini-VO environment in which it will be easy
to implement VO cone-searches, SIA, ... but this has not been
done yet.
For very
heavy processing, AW does not intend to distribute over the
federation, rather they will keep within the confines of the
Beowulf local to the data.
For peer-to-peer communications of large datasets, XML comms is
of course not used; again, bespoke peer-to-peer services make
data available to all other servers in the local compute
cluster via a WEB-based interface (eg. that is wget enabled).
Python classes exist for storage, with store/retrieve methods.
Data are cached locally for optimum performance.
WSA presentation
I gave my usual presentation (based on the talk I gave at NeSC
at Bob's `VO as a datagrid' meeting) of the WSA, including the SSA
demo. The AW team seemed genuinley interested in our experience
with MS SQL Server and also our approach to user interfaces.
Presently, it seems that AW has more in common with pipeline processing
developments at CASU and with distributed functionality along the lines
being developed by AstroGrid. Major points of interest were:
- AW has 1.0 FTE Oracle DBA in the form of Danny at OmegaCEN along with
a junior shadowing him and 0.2 FTE DBAs at each of the various nodes
in the AW federation.
- They got nowhere with trying to use existing spatial indexing schemes in Oracle and ended up implementing HTM from the JHU C++ libraries
- AW currently has little potential as an easy catalogue data-mining resource
AW website
AW CVS repository
-- NigelHambly - 03 Dec 2003
Revision r1.2 - 05 Dec 2003 - 16:28 GMT - IanBond Parents: InformalReports
|
Copyright © 2001 by the contributing authors.
All material on this collaboration tool is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback.
|
|