From nch@roe.ac.uk Mon Nov 6 20:51:34 2006 Date: Fri, 3 Nov 2006 15:16:35 +0000 (GMT) From: Nigel Hambly To: WFCAM Science Archive Team -- Eckhard Sutorius , Johann Bryant , Mike Read , Nigel Hambly , Nicholas Cross , Bob Mann , Ross Collins Cc: CCs for WSA weekly meeting minutes distribution -- Andrew Lawrence , Andy Adamson , Brian Walshe , John Taylor , Jim Emerson , Malcolm Stewart , Lorenzo Rimoldini , Mike Irwin , Mark Holliman , Peredur Williams , Stephen Warren Subject: WFAU VDFS Science Archive weekly project meeting: 3/11/06 Minutes of WFAU VDFS Science Archive meeting: 27th October 2006 ------------------------------------------------------------------------- ------------------------------------------------------------------------- Present: NCH, MAR, JB, MSH, PMW, NJC, ETWS, RSC, LGR Apologies: JPE, RGM, JDT, AL, JMS, BCW DONM: 10am, Friday 10th November 2006 in the Plate Library Actions discharged this week: ----------------------------- ACTION: AL to email Peter Clarke in NeSC to see if a little pressure from above helps to expedite implementation of UKLight at the EUCS end. Discharged; apparently some breakdown of communications between EUCS and UKERNA. Maybe the fact that the big cheeses have rattled a few cages will help (can cheese really rattle a cage? Not Camenbert - it's too ****ing runny). ACTION: JB and ETWS to scrub the constituent frames of interleaves for archived UKIDSS data only (where it has not been done already). Disharged. ACTION: NCH to email Jan at JHU to enquire about this (and database file sizes at the same time). Discharged - there's good news and bad - see Networking below. ACTION: NJC to contact STH directly concerning photometric recalibration of 05A, 05B and 06A. Discharged; see later actions and notes ACTION: NCH to prod AL to prod Peter Clark concerning this frustrating lack of progress from the UKLighters. Discharged; see cheese above ACTION: All to fill in their absences over the next few months on the TWiki. Discharged; NCH thanked RSC for sticking the fancy calendar thing on the TWiki. ACTION: ETWS, JB & NCH to meet on Monday 2pm to sort out current software and operation modes for CUs 1-3 in order to optimise and expedite flat-file turn around. Discharged; see Software below ACTION: NCH, MAR and RSC to meet Monday 4pm to finalise design of archive end quality bit flagging for DR2. Discharged ACTION: MAR to implement a DB stored procedure to encapsulate the production of IAU names based on queried table and celestial co-ordinates. Discharged Actions partly discharged but continuing: ----------------------------------------- The following from last week partly done but continue: ACTION: JB to rearrange external catalogues onto batch catalogue server ACTION: NCH to circulate a release schedule for DR2 to all concerned. Continues; have asked STH@CASU to give a date for delivery of the ZY photometric recalibration information ACTION: PMW to order an 8TB NAS storage brick. Continues; now looking at 16TB, chainable (e.g. SCSI), SAS (=Serial Attached Storage) and other options in consultation with IT Support. ACTION: RSC to profile CU4 and investigate possible optimisation Continues; on hold until parallelisation and other refactoring is complete (see Software below) Actions carried forward from 27/10/06 meeting: ---------------------------------------------- The following from last week continue: i) Enhance seaming to use quality bit information - ACTION: RSC - on hold until quality bit flagging is implemented o) Add in a default row for every detector appearing in every detection table (for schema consistency when querying merged sources and individual detections) - ACTION: RSC & NCH ACTION: ETWS & MAR to include NVSS into the WSA external catalogue suite, but as a very low priority and without bugging JB. - CONTINUES; MAR will reply to the user who requested this to note it's now in the queue and to ask for any supplementary info. Specific points and new actions: -------------------------------- Project management: NCH welcomed new data mining applications scientist Lorenzo Rimoldini (LGR) to the meeting, who will be attending to keep up-to-date with archive related matters. PMW asked the team to fill in the usual monthly progress charts for the VDMT telecon next Tuesday, 2pm. Note added after meeting: today (3rd) is the deadline for UKIRT campaign proposals, including the UKIDSS renewal proposals following the 2yr initial allocations. As far as we are aware, in addition to UKIDSS there are three new WFCAM surveys proposed, all of which wish to use VDFS: DIVES (PI Jon Davies; DXS-like Virgo cluster survey); UHS (UKIRT Hemisphere Survey; PI Andy Lawrence; LAS-like JK initially); plus a 200 sq.deg survey to complement LOFAR (acronym unknown; PI Steve Rawlings; DXS-like K-band only). Note also the U/06A/52 is being proposed for renewal as the WFCAM Transit Survey (PI David Pinfield), and again VDFS is specified for processing and archiving. WFCAM update: WFCAM is up and running with no problems following the earthquake two weeks ago. Comments and issues arising from CASU fortnightly minutes: No new minutes as of today AM. Photometric recalibration matters have prompted an email exchange between NJC, STH, MJI, JB and NCH. The JHK tweaks are in hand and under control, but not the ZY (which is essential for DR2 and is where the largest changes will occur). NCH has emphasised to STH the need to close out the remaining issues over ZY calibration with the calibration WG as soon as possible, as the DR2 schedule is critically dependent on this (e.g. Quality Control cannot commence until recalibration has been applied at the archive end). NCH has suggested that STH try to close this out by the end of next week (i.e. the 10th Nov); so far it is unclear exactly what the plan is at the CASU end. Finally, NCH noted that we would greatly appreciate any input from CASU concerning the rather poor network transfer performance that we are getting between CASU & WFAU at the moment (see next section). Networking: ETWS reported finishing download of BestDR5 from UIC over the last weekend with a rate of 5.3 MByte/s. NCH reported the continuing saga of SDSS-DR5. The good news is that the transatlantic copy of DR5 was achieved easily in about 3 days; also we are reliably informed that the DB will restore to SQL2000 (i.e. it does not require SQL2005). BUT it restores into a single DB file of nearly 2.4TB, which will yield poor performance and is more than the size of any one of the batch catalogue server's RAID LVs. The option of SW striping two such LVs is not a good one (it overloads CPUs with destriping). However, ETWS discovered that JHU *do* distribute a DB known as BestDR5_EFG which is split over 3 DB files and would be higher performance and much more suited to our needs (why this isn't distributed by UIC is a mystery). So we are currently awaiting further info from the US regarding the availability of the split DB file version of DR5; at the same time NCH is enquiring through other contacts to see if anyone in Europe has a copy of the same. As long as we get DR5 up by the start of the New Year, it will be available for cross-querying for UKIDSS DR2. With reference to the copying of DR5 from UIC, we note that the latest downloads from CASU have been running at an extremely poor 2MB/s for the past week or so. As far as we can tell, there is no bottleneck at this end. For example, the download rate was the same before, during and after the DR5 copy from UIC was instigated, and what's more that DR5 download from Chicago ran faster than the downloads from CASU so there seems to be plenty of spare network bandwidth at the WFAU end. The fact that we do not seem to be able to sustain a higher download rate does not bode well for optimising the turn-around to expedite flat file access, and will be a show stopper for VISTA if it turns out that the bottleneck is something to do with the local network topology at CASU (e.g. UKLight won't solve the problem either). WSA Operations: JB reported: "The constituent frames of the interleave files have now been removed and gave us about 2TB more space. CU4 has completed for 05A version 3, CU21 has been run prior to starting ingestion of 06A version 1, CU3 has now completed the first half of 06A version 1 ingestion (this has not been checked as yet so may be inaccurate or broken). CU1 is still running and has reached about the half way point of July of 06A version 1. Backups continued as normal. Have fixed a couple of bugs and spent time discussing future concepts of CU usage with ETWS and NCH, some of which is currently being prototyped by ETWS." PMW asked about operations cover for staff absences (e.g. over the Xmas/NeyYear period). NCH and MSH both noted that they will be around and available to come in to monitor/restart any operations CUs if necessary, and if provided idiot-proof instructions from JB. In the longer term, things will need to be formalised on the operations side to ensure continuous cover. Hardware: JB reported that NAS has been discussed with both PMW and JNTD and a couple of technical details are being ironed out. Software: NJC noted discovery (!) of a potential archive integrity loophole in that the default DB connection behaviour in the CU software was set to r/w access to the main DB; JB has hacked things around to change the default to read-only; RSC is now looking at a more elegant solution to ease development testing, but at the same time keeping the default behaviour as read-only to prevent any mishaps. ACTION: RSC to refactor the DB connection module to default to read-only on the main DB but with a run time option for rw. NCH, ETWS and JB met earlier in the week to discuss optimising and expediting CUs 1-4 to give improved turn-around for flat-file access of pipeline processed products, and with a view to scaling things to VISTA data volumes. The plan of attack is: o CU1 (CASU-WFAU transfers): i) make more automated and get into the background such that transfer is running routinely whenever new data becomes available at CASU (CU1 wrapper & cron job instigation) ii) chop transfers into smaller "atomic" batches so that housekeeping is easier and there is less repetition when the network drops o CU2 (JPEG creation) - has already been parallelised and optimised by ETWS so no performance or scalability issues there. o CU3 & 4 (ingest of flat file details and catalogues) i) parallelise by splitting ingest file build and actual ingest on the DB catalogue servers. MAR reported: "Wrote functions based on SDSS to produce IAU names as requested. Functions are in DR1 and DR1plus and show in the schema browser." ETWS reported: "Started refactoring CU3 and 4 for parallelizing. Still installing updated software on djoser. Got sneferu upgraded for usage as a curation server by IT support, but our software needs to be installed yet. Produced metadocs of the WSA, BestDR3 and BestDR5 for Astrogrid." Finally, NCH and RSC noted that SJW is coming up to Edinburgh next week (for a seminar) and that a meeting will take place to finalise the details of the implementation of quality bit flagging for DR2. Survey Data Release: Reiterating the proto-schedule from last week, minus one week of ingests: Complete transfer of 06A : +1 " ingest " " : +5 (CU4 for GPS is the bottleneck) Photometric recal : Critical - we don't as yet have the info from CASU about this, cannot start QC1 until we do, and it's impact on CU4 is not known. Quality bit flagging : +1 (can design & test in parallel with previous) QC1 : +1 (but cannot start 05A and early 06A until photometric recalibration is done; do we need to redo 05B owing to photometric recalibration?!) CU7 (source merging) : +3 (again totally dominated by GPS, and has to be done from scratch because of q-bits and recal) DXS CUs : not on the critical path, since much can be done in parallel with previous; later CUs run fast on the relatively small amount of data) CU16 : +1 QC2 : +1 Final CUs : +3 ... plus a couple of weeks for Xmas. Non-survey Data Release: MAR reported released of flat-file access to two nonSurvey programmes; JB noted with thanks the recent communications from the JAC operations end when non-survey programmes are complete. NCH noted that an enquiry from Mark Casali has been received concerning his continuing "guaranteed time" non-survey program U/05A/100 - NCH will reply. Astrogrid deployment: MSH noted: "This is where we stand on UKIDSS access for AstroGrid: The DSA is up and running error free. The community server is running error free. We have started entering users into the community database, and should have all the accounts created by Monday." Miscellaneous: RSC noted that he has put an ADASS report up on the TWiki at: http://apache.roe.ac.uk/twiki/bin/view/WFAU/ReportOnADASS2006