From nch@roe.ac.uk Fri Feb 20 15:49:16 2009 Date: Fri, 20 Feb 2009 14:49:46 +0000 (GMT) From: Nigel Hambly To: WFCAM Science Archive Team -- Eckhard Sutorius , Mike Read , Mark Holliman , Nigel Hambly , Nicholas Cross , Rob Blake , Ross Collins Cc: CCs for WSA weekly meeting minutes distribution -- Andrew Lawrence , Andy Adamson , Jim Emerson , Malcolm Stewart , Lorenzo Rimoldini , Mike Irwin , Peredur Williams , Bob Mann , Stephen Warren Subject: WFAU VDFS Science Archive weekly project meeting minutes, 20/2/09 Minutes of WFAU VDFS Science Archive meeting: 20th February 2009 ------------------------------------------------------------------------- ------------------------------------------------------------------------- Present: NCH, ETWS, NJC, RSC, PMW, MSH, MAR, RPB, RGM Apologies: JPE, JMS, AL, LGR DoNM: 10am, Friday February 20th 2009 in the Vista Hut Actions discharged this week: ----------------------------- ACTION: RPB to schedule a full backup of the WSA this weekend as it's long overdue. ACTION: RPB to return the two 250Gb disks and order something more useful. ACTION: RSC to document CFITSIO compression issues on the archive website Actions partly discharged but continuing: ----------------------------------------- ACTION: ETWS to parse that latest keyword info into the VSA metadata schema files. Continues ACTION: NCH to assign U/08B/14 H2/K frames to the GPS in the WSA. Continues Actions carried forward from 06/02/09 meeting: ---------------------------------------------- ACTION: ALL to look at JENAM programme and consider possible attendance Specific points and new actions: -------------------------------- Project management: Nothing new this week. WFCAM & VISTA updates: Nothing new to report this week. Comments and issues arising from CASU fortnightly minutes: No new minutes this week. Networking: ETWS noted that SDSS DR7 is nearly copied over from UChicago. WSA Operations: ETWS noted starting transfer and ingest of the 08A data. It will take ~45 to 60 minutes for an average night (standard transfer and ingest routine operations would take ~3 hours in total). RPB noted that all previously ingested detection data for the 08A dates which were reprocessed have been deleted from the database. RSC suggested we do the same for all reprocessed catalogue data to keep things as tidy and compact as possible. ACTION: RPB to scrub all reprocessed catalogue data from the *Detection tables in the WSA ingest DB. MAR noted that GPS DR5 QC is up to date wrt all the other surveys. Hardware: RPB reported: "Djoser seems to be fixed and back online at last. Horst has managed to extract all of the data from the dodgy array onto more stable storage. Problems with degraded array on Hatshepsut. RAID controller claimed that 5 disks had all failed at once. Called eclipse out to replace the RAID controller, and only one disk had failed. Replaced this, rebuilt the data (after some minor niggles) and hatshepsut came back up on Thursday morning with seemingly no problems." MSH noted: "The cluster installation continues, as the Windows Server 2008 installation program is failing to find and utilize the disk controller drivers. New drivers are being downloaded and tested this week. The new processing machine, menkaure, is ready for use and is only awaiting a DNS entry and some other details that need to be done by IT support." MSH also asked for a meeting to discuss the plan for testing the cluster dor distributed DB performance. ACTION: All interested parties to get together Tues 24th at 3pm in Crawford Meeting Room #2 for a clustered DB brainstorming. Software: RSC has been very busy this week: CFITSIO bug update: - Actually only affects certain confidence images from deep stacks for reasons presently unknown. - Example file sent to CFITSIO developers to see what they make of it, but I take expect to hear anything back relatively soon because of the very limited extent of the problem. - If it really is a bug in CFITSIO it may be best not to upgrade to 3.13 (or 3.1) at this stage, but hold off until a fix has been released, so I suggest we try a consistent installation of version 3.06 across the system (i.e. fpack, cirdr etc.), which is the version used by the latest 64-bit Starlink. - There's now a note on the Q&A section of our web site in case any of our users ever come across the problem (which they shouldn't if we don't release deep stack confidence images compressed by CFITSIO 3.1+ until the bug is fixed). epydoc problems: - Probably not cause by Python 2.6 as previously thought and can probably be fixed by testing installations of various different version combinations of epydoc, docutils and graphviz. However, this isn't high priority thing to fix, though it would be nice to know what the best software version combination is to be install across all the systems. wfcamsrc/exmeta infinite resource consumption bug: - Finally tracked this down to an infinite loop, as would be expected, in some 5 year-old code of Ian's that was incorrectly doing comparisons against different data types that originally worked by fluke because the different data types behaved the same on 32-bit systems. - Similar problems riddled the wfcamsrc code and have now been fixed. Pairing code segmentation fault: - Always only occurs upon memory allocations and deallocations, but the exact line of code that causes the problem will vary depending on the data set. Moreover, this only affects Pairing.multiPair(), and only on khafre. Debugging reveals no obvious reasons why this problem should exist. The pairing module is the only pure C code embedded in Python to have been tested on the 64-bit installation, so it could be due to any number of variables, e.g. the 64-bit system, the new version of GCC or Python 2.6. Therefore, I suggest we perform a software upgrade to one of the 32-bit systems and re-test there to help reduce the number of possibilities. - Since diagnosing this problem could take a very long time, I have produced a reimplementation of the Pairing module in Python to progress testing. This works, however, speed may well be an issue and before even considering it as a replacement to the C module it will need to be thoroughly tested to ensure it produces consistent results and that it scales to GPS data volumes. wfcamsrc/mkmerge bug: - Having fixed the Pairing code on the 64-bit system by re-writing it in Python I was then able to take the CU7 test further to the mkmerge stage where it failed again. This isn't caused by the Python implementation of the Pairing code because that was seen to work together with mkmerge on the same data set on the old 32-bit systems. Having just spotted it I need more time to diagnose the problem. All code changes have been tested and seen to still work on the 32-bit system installations and everything tested works on the new khafre 64-bit installation except for CU7. Will create a release_10 branch to release the software fixes for the new software/64-bit installation when operations can return to parallel curation activities and thus make use of khafre again. This will also allow us to start using the new menkaure server once the systems software has been installed there. Furthermore, new 64-bit specific code to help UDS curation can also then be developed. When the CU7 bug is fixed, or deemed to be 64-bit specific, and the release_10 branch has been created the remaining 32-bit systems can also be upgraded to the new software installation. The issues associated with the new software/64-bit installations have been fully noted in the SoftwareUpgrade TWiki article. ETWS noted that sneferu can be used as a test-bed installation machine for comparisons in some of the code investigations above. Survey Data Release: Various user requests for enhancements in facilities have been received this week, with MAR providing most of the support. Non-survey Data Release: Nothing of note this week. Astrogrid deployment & Data Analysis services: Nothing new this week. Miscellaneous: NCH noted that a visitor from Queen's Univ Belfast will be with us midweek next week to gain insight into multi-TB data management and RDBMSes for PanSTARRS purposes. ============================================================= Nigel Hambly Tel: +44-131-668-8234 Institute for Astronomy Fax: +44-131-668-8416 School of Physics and Astronomy University of Edinburgh Email: nch@roe.ac.uk Royal Observatory Blackford Hill Edinburgh EH9 3HJ The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. =============================================================