From nch@roe.ac.uk Fri Oct 5 14:24:40 2007 Date: Fri, 5 Oct 2007 14:07:02 +0100 (BST) From: Nigel Hambly To: WFCAM Science Archive Team -- Eckhard Sutorius , Johann Bryant , Mike Read , Nigel Hambly , Nicholas Cross , Bob Mann , Ross Collins Cc: CCs for WSA weekly meeting minutes distribution -- Andrew Lawrence , Andy Adamson , Brian Walshe , Jim Emerson , Malcolm Stewart , Lorenzo Rimoldini , Mike Irwin , Mark Holliman , Peredur Williams , Stephen Warren Subject: WFAU VDFS Science Archive weekly project meeting minutes, 5/10/07 Minutes of WFAU VDFS Science Archive meeting: 5th October 2007 ------------------------------------------------------------------------- ------------------------------------------------------------------------- Present: NCH, RSC, NJC, JB, LGR, MAR, ETWS, MSH Apologies: BCW, JPE, AL, RGM, PMW, JMS NB: DONM: 10am, Friday 12th October 2007 in the Plate Library Actions discharged this week: ----------------------------- ACTION: RGM to look through requirements docs to see if any users have mentioned GALEX. Discharged; GALEX is not explicitly mentioned in the URD as a required external dataset, but is mentioned in the usage examples. ACTION: JB, NJC and RSC to make setting of CU8 (photometric ZP recal) a top priority today so it can run over the weekend. Discharged with great rapidity last Friday; all was going fine until Wednesday... see Hardware below. Actions partly discharged but continuing: ----------------------------------------- The following from last time partly done but continue: o) Add in a default row for every detector appearing in every detection table (for schema consistency when querying merged sources and individual detections) - ACTION: RSC & NCH Continues; will get sorted (honestly!) as part of the general ingest DB schema revamp to facilitate rapid generalised photometric/astrometric recalibration after DR3. Actions carried forward from 28/09/07 meeting: ---------------------------------------------- ACTION: JB to restore BestDR1 to thutmose from LTO tape at a convenient time (but no rush!). Specific points and new actions: -------------------------------- Project management: NCH noted that the VEGA-VDFS grant has now ended, and the development effort is being moved on to the VDFS extension grant from October 1st. WFCAM & VISTA updates: Nothing new this week. Comments and issues arising from CASU fortnightly minutes: No new minutes this week. Networking: ETWS has transferred the SegueDR6 (289GB) with an average transfer speed of 2.9MB/s from Chicago. The team discussed where to put it, and agreed that public server thutmose was the best place since this hosts the other external datasets. JB noted that there should be enough room, despite the fact that the DB is provided in a monolithic database file (which goes against all good sense). ACTION: JB & ETWS to restore SegueDR6 on thutmose at their convenience (i.e. no hurry given the present circumstances) As regards GALEX, the team agreed that we should see what the status of any prepared (preliminary) catalogue release is before setting it up as an external dataset in the archive. ACTION: ETWS to ascertain what the status of any prepared catalogue product from GALEX is. JB noted that CASU are processing 07B data now, and it would be great to transfer/ingest a few nights asap just to make sure all the early 06B troubles are well behind us; no data are marked as OK_TO_COPY as yet. WSA Operations: JB noted a request from the operations side to make tracking of problem files a little easier by logging details in one place Error reports are a little spread around at the moment, and interleaved with warning/information details. ACTION: JB to open a trac ticket for a logging software enhancement QC1 and recalibration were progressing well until... Hardware: ... a major hardware failure on the catalogue load server ahmose occurred on Wednesday: hot-swap of a degraded RAID array precipitated loss of another disk in the same array, which in turn made the logical volume fail. SQL Server was unluckily accessing the DB transaction log on that same volume at that time; this resulted in corruption of the ingest DB. JB has been carefully trying to restore this database to avoid having to go back to the last full tape backup (which would lose several weeks of work). Hence, DR3 preparations are on hold until the WSA is restored to it's state as at Wednesday am. The team discussed a few measures to avoid this happening in future, including closing down SQL before "hot"-swapping (a kind of luke-warm swapping, if you will...). and ultimate migration of the ingest DB on to a more stable system (the Ultra320 disks/backplanes have been a p-in-the-a since day one). Mitigation measures in case of such a major failure occuring in future include regular replication (in a non-SQL sense) of the image metadata on the public server to maintain flat-file/jpeg access during periods of ingest DB downtime and a strict adherence to the monthly full backup policy with the possibility of more frequent full backups following major data modifications (and possibly use of differentials). ACTION: RSC to progress implementation of automated replication of the ingest DB file metadata on a public server. RSC, MAR and NCH have cludged restoration of image metadata to enable continued flat-file/jpeg availability while the ingest database is being restored (select "TEMPWSA" in the archive listing). There is a suspicion that the recent spate of power cuts has exacerbated the situation as regards recent hardware failures, what with loss of system disk on the 6dF DB server, a power supply on public server amenhotep, and a spate of disk failures on the catalogue servers. Furthermore, the ambient temperature in server room C1 continues to be a worry. Our aspiration to be a major international data centre do not look viable under these circumstance... ACTION: NCH to bug site services concerning unacceptable infrastructure problems. JB noted that we should seriously consider rebuilding the ingest database on the new, hopefully more stable, SAS server hatshepsut once the SSA load has been completed on that machine. This rebuild could cover a number of issues including inconsistencies in the existing WSA, implementation of separate file groups for the subsurveys, etc. to improve performance. Software: ETWS has been investigating the dark sky background in J and K JPEGs and may adjust processing for future releases in response to a query from UKIDSS (SJW). RSC has completed work on the UDS cross-talk quality error bit flagging; this now needs to be tested. NJC has examined some FITS file issues in the light of recalibration operations. MAR has made interface modifications to cope with the unavailability of the ingest DB. Survey Data Release: Schedule is on hold due to major hardware failure (see above): Weeks What: 1.0 CU8 (photometric zeropoint recalibration) 2.0 QC1 (much work to be done in parallel with previous tho') 0.0 CU5 (diff images for the GPS - after QC, but in parallel with the following and it's very quick anyway) 1.0 Quality bit flagging 3.0 CU7 (source merging from scratch again, unfortunately) 0.0 CU13/14 for DXS/UDS (in parallel with shallow survey CU7s) 2.0 Final CUs. LAS/GPS eyeballing continues despite an interuption caused by ingest DB corruption. SJW has noted that some jpegs are unavailable ACTION: ETWS to look into the examples of unavailable jpegs. Non-survey Data Release: Nothing to report this week. Astrogrid deployment: MSH noted that the full SSA build on hatshepsut is not progressing as might be expected. ACTION: MSH and NCH to check out the SSA loading situation on hatsehpsut MAR and MSH noted that the DSA services should now be back to normal following interuptions to services on both database servers thutmose and 6dF. RGM emailed NCH some info concerning the IVOA footprint services that have been set up by Tamas Budavari at JHU - see http://voservices.net/footprint/ MAR noted that he looked into this a month or so ago, and that it wouldn't work for a large set of detector specifications as exists, for example, in the wide/shallow UKIDSS surveys. Anyway, NCH suggested that LGR and MAR look into this again, with a view to supplying enhanced analysis services for the WSA. Miscellaneous: Nothing else this week.