From nch@roe.ac.uk Mon Jan 9 21:17:39 2006 Date: Mon, 9 Jan 2006 11:00:59 +0000 (GMT) From: Nigel Hambly To: WFCAM Science Archive Team -- Eckhard Sutorius , Johann Bryant , Mike Read , Nigel Hambly , Nicholas Cross , Bob Mann , Ross Collins Cc: CCs for WSA weekly meeting minutes distribution -- Andrew Lawrence , Andy Adamson , John Taylor , Jim Emerson , Malcolm Stewart , Martin Hill , Mike Irwin , Peredur Williams , Stephen Warren Subject: WFAU WSA weekly project meeting minutes: 6th January 2006 Minutes of WFCAM Science Archive meeting: 6th January 2006 ------------------------------------------------------------- ------------------------------------------------------------- Present: NCH, ETWS, MAR, AL, RSC, PMW Apologies: JPE, JDT, MCH, RGM, JMS, NJC, JB DONM: 10am, Friday 13th January 2006, plate library Happy New Year to all our readers. Actions discharged: ------------------- ACTION: NJC to send some examples of apparently inconsistent image attribute sets to JAC. Discharged; reply from Paul Hirst received and it seems that these are all cases of early teething problems in the dataflow system upstream, rather than any sinister bugs in pipeline/archive. ACTION: NCH & RSC to chase up and fix the pairing code bug. Discharged; bug fixed; pairing tested and found to be OK. Actions partly discharged but continuing: ----------------------------------------- ACTION: JDT to set up a Datascope Launcher for the WSA on the AG workbench pages. Continues. Actions carried forward from 09/12/05 meeting: ---------------------------------------------- ACTION: NCH to investigate the usefulness of a NAS solution for medium term archive mass storage. - CONTINUES; NCH reported a useful chat with Eclipse computing, who have been supplying and installing NAS boxes to some of their customers. ACTION: ETWS and NCH to sort out image DQC schema and ingest - CONTINUES; NCH noted that he and RSC had been working hard on this, implementing a procedure to add in new attributes to existing metadata tables and populate them from all FITS files held in the archive. Specific points and new actions: -------------------------------- Project management: NCH welcomed everybody back for the New Year with single malt (courtesy Eclipse Computing) and shortbread. PMW noted that there is no VDMT in January, but the Q1 plan is being finalised given fresh input from UKIDSS and the requirement to produce an EDR by the end of Jan. NCH noted that this is likely to result in a fire-fight for most of Jan; hence "planned" work will likely spill over into Feb/Mar. With reference to the "things to do" list itemised in the last minutes: 1) Check that the merged source list pairing is producing clean lists NCH and RSC have done this, fixing small bugs where necessary; NCH has also suggested to UKIDSS that the pairing radius be reduced and is taking advice from survey heads and CASU on the optimum choice. 2) Catch incorrectly reobserved frames as NOT multiple epoch revisits and generally make frame merging a little more intelligent over choice of frame when there is more than one Needs to be implemented in general iteration of software for next data release. QC: a) Catch frames with bad channels / large spurious detection numbers b) " " " moon ghosts c) Ensure all available DQC is propagated into metadata tables (e.g. sky sub scale parsed from FITS comment). d) Implement pre-release QC in collaboration with SJW. All going ahead as part of general QC which is being discussed and implemented this week in consultation with SJW and SD. Wish list: a) Include SQL query in results set output in appropriate format(s) and echo to the results page d) > 100 columns for formatted HTML summary results e) larger limit to results set size g) document/implement default flux estimator as Aper3 as being generally appropriate h) Guide for simple usage to prevent "frequently made cock-ups", e.g. careful filtering when using merged source tables i) Sexagesimal format on results page and consistent orientation of thumbnails j) Dictionary translation between pipeline FITS keys and table attributes k) Reduce source pairing radius for GPS, as per WG wishes Actions ongoing on MAR, ETWS, NCH on all of the above. b) Client-end query tool for scripting c) Seeing in arcsec as a new attribute f) petro (kron/hall?) radius in merged source table At this stage, these are seen as lower priority... WFCAM update: See the stop press pages linked from the TWiki for latest news. Comments and issues arising from CASU fortnightly minutes: No new minutes as of 6/1/06 AM. Networking: A power hiccup (see below) meant that the Xmas transfer from CASU was interrupted; however ETWS reported that as at 6/1/06, all data that is OK_TO_COPY has been transfered from CASU (2.3TB) at a healthy 13.6 MB/s (in Jan at least; logging info from Dec lost due to power outage). ETWS noted that there are rather a lot of directories that are flagged as "checked" on the processing status page and yet do not have OK_TO_COPY files present; these will not (of course) get transfered until the semaphore files are written in by CASU. ETWS reported no response from Chicago re: SDSS-DR3, so will nudge them to see what's going on. ACTION: ETWS to nudge UChicago re: transfer of BestDR3. WSA Operations: NCH noted that JB had reported the fall-out from the Dec 23rd power outage; no operations could take place over the Xmas break because the DBMS load server was zapped and out of commission. Hardware: NCH reported JB's notes from the Dec 23rd power outage: "There has been a power outage (externally caused) at 4:30am (ish) this morning, power was not restored until about 8am meaning that everything has gone down and some of it tried to come back, current information: djoser - Came up fine with no problems that I can see... drive four (the one that failed earlier this week) is fine. sneferu - Came up fine with no problems that I can see. Needed powering up when I came in. khufu - Came up fine with no problems that I can see. Needed powering up when I came in. thoth - Seems to be up and running. Difficult to confirm as I've no way to access thoth that I know of (cluanie being unusable doesn't help). amenhotep - Wasn't responding when I first came in but a reboot seems to have cleared it, no problems arose when it came back up and there are no problems on it that I can see. ahmose - Was up but not responding properly when I first came in. It had disk problems but I couldn't do anything because it wouldn't login. A reboot let me do something but didn't succed because the issue we had with the NVRAM for the Raid controller (same as last time this happened I think) happened again, I restored the disk copy of the Raid config to the NVRAM but three of the disks had died and they were dropping as quickly as I put them online again (meaning rebuilding the disks was failing to work either). Since this seems to be the Raid with the OS on it the system is unbootable and keeps beeping due to the failed disks. I have taken the executive decision that since it is not the public facing server, I have only one spare disk (I think, maybe two actually but certainly not three) and it is not in use (any running jobs will have gone during the power outage) over the holidays that it is officially out of commission until the end of the holidays. More info on this later for Nigel and Perry as I'm going to contact Ian about getting a coupel of extra spare drives in for after the holiday so that the duds can be replaced and we can start a rebuild of the system. The good news however is that the backup of the WSA on ahmose should have finished verifying by around 9 or 10pm last night and was certainly already taped when I left last night so the backups for ahmose are as complete as you could hope them to be! :) I have submitted a ticket about getting Ahmose rebuilt after the new disks are in. As regards ongoing work: WSA - as I said above this was backed up yesterday after the WORLDR2 cu19 script was run so we have the data as of late Wednesday night (I think it was). WORLDR2 - I do not know if the copies that I start last night from Ahmose to Amenhotep finished or not, however if they did it might not actually help as the data on G: (I think, might have been I:) was yet to be copied over (about 8GB of data), so WORLDR2 is not yet attachable on amenhotep, however all the data on J: (the probably failed RAID on Ahmose), H: and either the other of G: or I: were copied over (about 300GB) so it will not take long to get WORLDR2 running on Amenhotep once we can boot Ahmose again. WORLDR2 did not yet get backed up to tape (this was going to happen today)." NCH reported further that ahmose has now been sorted out *nearly* since it is back up and running with SQL Server available on the network and the main WSA DB available e.g. from amenhotep; BUT it reboots when any user tries to log in in multi-user mode. Compute Support suspect a corrupted system file. Investigations continue; this has not impacted the release schedule so far (Xmas break limited curation activites anyway). Hopefully the system can be fully restored on Mon 9th. Software: Nothing to report due to Xmas break. Survey Data Release: NCH noted that UKIDSS GPS survey head Phil Lucas had enquired as to aperture photometry corrections. Investigations have led to discovery that this bug (discovered in August last year) was only fixed for the LAS (and non-survey datasets) and will need to be fixed up in the release and ingest DBs, plus the SV community will need to be informed. ACTION: NCH to fix the aperture correction bug in release and ingest DBs, and inform the SV community. World release has been delayed by power outage; again, this should be sorted out early next week. NCH noted that UKIDSS Consortium Survey Scientist SJW and Science Verifier SD would be visiting next week to thrash out the details of the quality control procedure for the impending data release. Non-Survey Data Release: No news this week. Astrogrid deployment: AL noted the possibility of an AG talk/demo at ROE to be given by himself. Miscellaneous: Nothing further this week.