From nch@roe.ac.uk Wed Jun 28 12:47:41 2006 Date: Fri, 23 Jun 2006 16:15:32 +0100 (BST) From: Nigel Hambly To: WFCAM Science Archive Team -- Eckhard Sutorius , Johann Bryant , Mike Read , Nigel Hambly , Nicholas Cross , Bob Mann , Ross Collins Cc: CCs for WSA weekly meeting minutes distribution -- Andrew Lawrence , Andy Adamson , John Taylor , Jim Emerson , Malcolm Stewart , Mike Irwin , Mark Holliman , Peredur Williams , Stephen Warren Subject: WFAU WSA weekly meeting minutes, 23rd June 2006 Minutes of WFCAM Science Archive meeting: 23rd June 2006 ------------------------------------------------------------------------- ------------------------------------------------------------------------- Present: MAR, NCH, NJC, PMW, RSC, ETWS, MSH, JMS, RGM, JB Apologies: AL, JPE NB: DONM: 10am, Friday 30th June 2006 Actions discharged this week: ----------------------------- ACTION: JMS to contact JPE regarding progress on October UK VDFS review organisation Discharged; JPE is fully aware of the urgency over this, and has been making progress. Review will take place in October (exact date TBC but note that ADASS is 15-18 during which several key VDFSers will be unavailable); panel membership is likely to comprise individuals collared at the recent SPIE plus representation from the VISTA Public Survey Proposal PIs. ACTION: MSH to begin looking into provision of mass storage at the University's SAN at Roslin. Discharged; NCH asked that JB and MSH start a TWiki topic on their investigations into alternative mass-storage strategies for WFAU. ACTION: NCH to email MJI concerning the status of enhancements to CASUExtractor Discharged; NCH emailed MJI and received an immediate and helpful response. A face-to-face with JRL was also possible earlier this week thanks to the VISTA PSP jamboree, and JRL also indicated that the sky-variance estimator at least could be fixed very quickly. ACTION: RSC to apply a work-around to the CU7 NFS bug between applications software on client and DB server. Discharged; RSC put a temporary hack in (basically a time-delay) to work around the NFS communication snafu; however JB reported that unfortuantely CU7 still fell over this morning when attempting to source merge a particularly large frame set in the GPS. Further investigation will be done today with the utmost urgency... Actions partly discharged but continuing: ----------------------------------------- None this week. Actions carried forward from 23/06/06 meeting: ---------------------------------------------- ACTION: JB to review Hardware/OS/DBMS design doc to note areas that need updating and itemise new sections required - CONTINUES: Note that this is on hold until after DR1. Specific points and new actions: -------------------------------- Project management: NCH reported back from the VISTA PSP meeting earlier in the week, noting that 9 of the 10 proposals wish to use VDFS and that, broadly speaking, science archive requirements are well covered by the current design. WFCAM update: Nothing new to report this week Comments and issues arising from CASU fortnightly minutes: The team noted the CASU minutes of the meeting of 12th June. With regard to the requested test of a single-threaded scp data transfer, we will be very pleased to do this after the DR1 release when the archive machines are not heavily loaded and the benchmark will be hopefully more meaningful. Networking: NCH noted that he and MAR had indentified a performance gotcha with SQL Server in that join queries between tables located on different linked servers run extremely slowly. An email to Jim Gray, JHU and Microsoft has produced an immediate response that this is a known problem and that currently no fix is available. This requires a rethink on our strategy of spreading the load of various DBs held in the archive (see below). However it appears that Microsoft are working on the problem. WSA Operations: JB reported: Backups of servers went as normal with khafre being added to the list. A file that was ingested in an impossible way has been fixed - other occurances of the same type of problem do not exist in the WSA so this seems to have been, literally, a one-off. CU7 and CU16 have been successfully run for the GCS and CU16 for the PTS. CU7 has been run for the LAS but a problem meant it needed rolling back, it is expected to have been re-CU7'd by the end of the weekend. CU7 was started for the GPS but hit a glitch, glitch now (hopefully) fixed though a workaround is also in place so the GPS should, theorectically, be able to finish - probably some time next week. Routine data transfer will resume as soon as possible (i.e. when DR1 fire-fight is over) Hardware: JB noted that khafre is all fine after some glitches concerning anomalous SCSI errors, but that software installation is awaiting help from IT Support. The disks of khafre are available to receive reprocessed 05A data and 06A data. A protracted discussion concerning hardware expansion occupied most of today's meeting. Issues such as physical space, power consumption, heat dissipation and financial resources were hotly debated, along with the linked-server performance worry identified above. NCH noted that our priority must be catalogue provision since this is where all the science is done, so the team made the decision to acquire another JBOD for the catalogue server thutmose, and make this a "batch" query server for cross-querying and large external catalogues (SDSS, SSA, USNO-B, etc.), the batch server also to service incoming from any future Astrogrid queries. There is a need to attach a large ammount of storage to a single server now, since the linked-server cross-querying performance is diabolically slow for any large real-world astronomical dataset. After the DR1 fire-fight is over a major review of mass-storage provision will take place in order to give premises a more accurate forecast of likely power requirements in C1, factoring in the experience of the last year or so of operations and the possibility of out-sourcing the mass storage of flat files. Immediate actions identified were: ACTION: PMW to get a quote from Eclipse for another 4TB JBOD for batch catalogue server thutmose (that'll annoy premises...) ACTION: RGM to investigate further any funding opportunities from elsewhere (i.e. AG and IfA) JB also reported: A TWiki area is being setup by MSH and JB to document the investigation into the NAS/SAN but a brief rundown of work up to now is as follows. The University Computing Service SAN is not an option due to costs (both capital and maintenance). Full NAS seems over spec'd for what we actually need though a SAN may be less so in this case. RGM has set up a meeting to discuss using the University Research SAN which MSH will be involved in, costs are not yet known. Costs for a NAS are quite high though less than going with the standard Pixel servers we have been using, various plain disk, hybrid disk/DVD systems are being looked at, though WORM (Write Once, Read Many) systems seem over-spec'd DVD jukeboxes may be a feasible option. The main issue with the SAN will likely be maintenace costs (including upgrading the SAN to handle our storage requirements). The main issues with a system at RoE is space for the system, along with sufficient cooling, power and connectivity to maintain it in a running state sufficient to meet our uptime obligations for WFCAM and VISTA - these may not be such large issues given some of the expertise and contacts within the Unit. Software: MAR reported: "Put some extra parsing in the SQL qurey servlet to direct SDSS DR3 queries to thutmose. Should work once IT support open up the port to thumose in the tomcat policy file. Added a feature to the HTML resultset whereby if RA, Dec and framesetid are in the returned attributes than links are returned then generate cut-outs from all multiframes makeing up the merged source." ETWS reported: "Started installing software on khafre, but waiting for IT support to fix general setup issues. Enhanced parser/browser pages for DR1." NJC reported: "This week I have stacked all the DXS data. I have made imporvements to the wrapper for Source Extractor. It can now run in dual image mode. Run CU4 on catalogue products. These can be ingested into the archive. Discussed UDS archive strategy with Nigel, Omar and Sebastian. Identified problems - no showstoppers, but quite a lot of work still to do." RSC reported: "I've been working on various enhancements to the software. A small patch to CU7 to handle rare NFS delays. Investigated statistics of missing provenance information in the database, with a view to optimising the efficiency of the provenance updater script. In the development CVS branch (named DR1_DEV) I have developed a consistent, robust and powerful command-line interface tool for all of our Python scripts. I've also made a few more of my helper scripts available and continue to improve code documentation, the ethos for which is maintained in this document: http://apache.roe.ac.uk/twiki/bin/view/WFAU/PythonStandard, which welcomes contributions from everyone when they have time. A small note on a CVS branch gotcha I haven't mentioned before. The code may be broken in a non-obvious manner when the two branches merge if some additional code is incorporated into the main branch that relies upon some code that is deprecated in development branch. The best solution to this problem is firstly to keep the number of changes applied to the main branch to a minimum (just those required for DR1 operations), and secondly for all developers contributing towards changes to the main branch to pay heed to changes in the development branch. CVS branching notes are maintained here: http://apache.roe.ac.uk/twiki/bin/view/WFAU/CVSBranching." Survey Data Release: Reiterating the DR1 preparation schedule as it currently stands: DR1 release at: 14/7 (NB: databases to be DR1 & DR1PLUS !!) Copying/transfering/ 7/7 backups etc. start UKIDSS pre-release 1/7 checks start Final CUs: 16 (incl. 23/6 SDSS DR3 hopefully), 18,19 start CUs 2,3,4,7 for DXS 20/6 and UDS start Archive QC2 starts 14/6 CU7 for wide/shallow 7/6 surveys starts Status as at today (23 June): GCS & PTS: done and ready for final indexation then release (hooray) LAS: needs sources to be remerged following NCH's failed hack to get around inconsistent tiling in different filters in a small number of areas. Will be redone today. GPS: Source merging over the weekend and into next week (it takes ages) DXS: all stacked up, but awaiting enhanced CASU-extractor (i.e. the enhanced sky background variance estimator) so currently stalled. UDS: Decision was taken earlier this week to go with the Nottingham folks' SWARP stacked mosaics and S-Extractor for object detection and parameterisation. Omar & Seb are doing much of the hard work in consultation with NJC; NCH outlined the two-pronged approach whereby the final stacks based on data available from the current processed version in 05B would be done immediately and put into the archive; then Seb will try to produce a 30 to 40% enhanced version incorporating reprocessed data (that is deprecated in the current version ingested in the archive) obtained directly from CASU, courtesy of MJI & MR. All this has some pretty major implications for archive-end SW and documentation, so: ACTION: NJC & RSC to tweak source merging code to cope with the differences between CASU- and S-extractor (mainly in handling of merged classifications) ACTION: NJC & ETWS to finalise modifications to CU3 & 4 to facilitate ingestion of data, esp. S-extracted catalogues, into the archive ACTION: ETWS & NJC to unlink all UDS documentation and glossary entries pointing to CASU-extractor info ACTION: NCH to fettle the CU19 release script to expurgate all CASU-extracted intermediate stack catalogues from udsDetection in release DBs to avoid utter confusion on the part of WSA users. Non-survey Data Release: No time even to finish typ Astrogrid deployment: Ditt Miscellaneous: RGM noted the Astronomy Data Management component to this year's IAU Assembly in Prague. MSH kindly volunteered to look into submitting an abstract (by Monday) based on RSC's snazzy ADASS poster of last year, generalising to cover all WFAU activities. Poster to be sorted out after DR1; if MSH does not attend, then presenting author will be AL (who is going anyway). ACTION: MSH to co-ordinate abstract submission to IAU Prague on WFAU data handling activities, and poster design.