From nch@roe.ac.uk Fri Oct 6 17:16:40 2006 Date: Fri, 6 Oct 2006 16:59:13 +0100 (BST) From: Nigel Hambly To: WFCAM Science Archive Team -- Eckhard Sutorius , Johann Bryant , Mike Read , Nigel Hambly , Nicholas Cross , Bob Mann , Ross Collins Cc: CCs for WSA weekly meeting minutes distribution -- Andrew Lawrence , Andy Adamson , Brian Walshe , John Taylor , Jim Emerson , Malcolm Stewart , Mike Irwin , Mark Holliman , Peredur Williams , Stephen Warren Subject: WFAU Science Archive weekly project meeting, 6 October 2006 Minutes of WFAU VDFS Science Archive meeting: 6th October 2006 ------------------------------------------------------------------------- ------------------------------------------------------------------------- Present: NCH, RSC, MAR, JB, ETWS, MSH, PMW, BCW, AL Apologies: JPE, JMS, RGM, JDT, NJC DONM: NB!! 10am, Wednesday 18th October 2006 in the Plate Library (Friday October 13th is an all-day UKIDSS meeting) Actions discharged this week: ----------------------------- ACTION: JB to make a full WSA backup around mid-September. Discharged; see Operations below. ACTION: RGM to email Ken Rice (IfA contact concerning IBM P690s) about potential WFAU interest in high-end freebies from IBM. Discharged as it's probably passed it's sell-by date; at the IfA meeting earlier in the week, it transpires that at least one of these machines is heading in the Institute's direction (a kind of "grab now, think about it latter" approach...) Actions partly discharged but continuing: ----------------------------------------- The following from last week partly done but continue: m) rearrange external catalogues onto batch catalogue server - ACTION: JB p) Release transit survey data as non-survey databases - ACTION: MAR, NCH & JB Continues; MAR (apply publishing settings and check with interface, JB to then backup and place online; ETWS to create a new browser site including Transit only). Calibration release on hold until after recalibration has been applied in the main DB. Actions carried forward from 29/09/06 meeting: ---------------------------------------------- The following from last week continue: d) Debug CU7 for deprecated frame sets and deletions on rerun - ACTION: NCH i) Enhance seaming to use quality bit information - ACTION: RSC k) Ensure ingest code can cope with 06A missing data quirks and any associated new attributes - ACTION: ETWS o) Add in a default row for every detector appearing in every detection table (for schema consistency when querying merged sources and individual detections) - ACTION: RSC & NCH ACTION: NCH to circulate a release schedule for DR2 to all concerned. - CONTINUES; see Survey Release below. Specific points and new actions: -------------------------------- Project management: NCH welcomed newcomer Brian Walshe (VOTech datamining developer) who will be attending these meetings over the next wee while to get the hang of the way things are in the whacky world of very large databases in wide field astronomy. NCH reported back from the VDMT, where the usual progress charts and narrative were presented by PMW. The main actions to come out of the meeting concerned the October UK VDFS FDR: ACTION: NCH to read the CASU overview and pipeline paper documents ACTION: NJC (on behalf of NCH) to read the calibration document ACTION: PMW and NCH to book their individual travel arrangements in order to arrive Monday 23rd PM for a last minute hymn-sheet check with JPE and CASU in the pub in the evening. ACTION: PMW, NCH and RSC to book their required individual accomodation in the Arundle House. PMW noted that he is in the process of preparing the Q3'06 plan of work for approval by the team next week; NCH noted that he intends to delegate UKIDSS quality control to MAR for the next round with SJW. Everyone (including MAR) agreed that it is a good idea to have more than one person (i.e. the worryingly close to going ga-ga NCH) familiar with the procedures. WFCAM update: Nothing new to report this week. Comments and issues arising from CASU fortnightly minutes: No new minutes, so no new insults (apologies for any offence caused by last week's rant...) Networking: RGM (i.e. Bob) has kindly forwarded an even kinder offer from the Portsmouth group to make available to WFAU their recently copied version of the SDSS-DR5 Catalogue Access database. ACTION: NCH to forward Helen Xiang's email address to ETWS to set the wheels in motion over an SDSS-DR5 copy. NCH asked about the status of 2XMMP cross-neighbours; RSC volunteered to hack CU16 to create 5 cross-neighbour tables for UKIDSS and XMM for our chum Mike Watson to check out in advance of any wider release. ACTION: RSC to make cross-neighbour tables between UKIDSS and 2XMMP JB noted that he has again emailed UKLight to see if there is any at the end of the tunnel (or should that be fibre?). Anyway, NCH suggested an email nudge to Peter Clarke at NeSC from AL may help to move things along from on high. ACTION: AL to email Peter Clarke in NeSC to see if a little pressure from above helps to expedite implementation of UKLight at the EUCS end. WSA Operations: NCH noted that as an absolute priority we must finish the ingest of 05A v3 so that UKIDSS can preview and give the green light for a (TBC) DR2 release schedule. JB and ETWS noted that they have been sorting out the correction (via a script supplied by CASU) for a few nights in May; NCH suggested that we plough ahead with the June ingest and set the preview going on that first, just to keep the momentum up. ETWS noted that there was a small incompatibility between the Perl-linked CFITSIO at our end that needed sorting out, but that he didn't anticipate any other problems in executing the updates. ACTION: JB to ingest June 05Av3 prior to May in order that UKIDSS can preview. NCH noted that MJI and JB had reminded him earlier in the week that CASU are subject to an institute-wide power cut this weekend, so all CASU servers are affected and should be assumed to be unavailable from later afternoon today to sometime on Monday. NCH suggested that if we can get things moving quickly enough, this weekend might be a good time to slurp up SDSS-DR5 from Portsmouth. JB noted that disk space is getting a little tight. NCH noted that at the July UKIDSS meeting it was reiterated that WFAU needn't retain the individual micro-stepped frames of interleaves once their metadata is ingested and JPEGs made: ACTION: JB and ETWS to scrub the constituent frames of interleaves for archived UKIDSS data only (where it has not been done already). Finally, JB and NCH noted that last week's noted concern about 05A v3 files apparently changing after copying is innocuous: CASU have been applying recalibration updates to the files as part of their finalising the calibration in their copies. Otherwise, JB reported: "CU1 has been running for most of the week and we have tranfered all of 06A version 1 up to the end of June... we are just starting the last month of data now. Work on CU3 and CU4 for 05A version 3 May and June has started. I have prodded EUCS about the UKLight equipment though have heard nothing back yet, though the loss of JANET earlier this week probably partly accounts for this. Backups continued as normal which turned out to be a good thing since Thutmose had Windows corrupt it's own system disks - possibly as a result or causing a disk failure as well, work is ongoing to restore this server but it will likely not be fully operational until next week. Slow down problems with Ahmose and occasionally Amenhotep were fixed and Thutmoses problems may have been fixed though the trashed system disk makes it difficult to tell! UKIDSSDR1(PLUS) has been setup in a more efficient configuration on Thutmose (before the failure) and so should produce results a little faster now. Space on the pixel servers has been shuffled to maximise space for transfers. Related to this the investigation of a new NAS box is now in the works." JB reported the successful backup of the WSA database; NCH suggested that we investigate efficacy of a differential backup after an appropriate period of ingest DB modifications. Hardware: With reference to the disks filling up, PMW enquired about whether we should be looking at buying a new storage brick, and what configuration/specification that should be. JB and NCH suggested that he make enquiries about a NAS block, possibly employing 750 GB SATA disks since these seem to be receiving good reviews. ACTION: PMW to get advice/quotes on a NAS storage block based on (possibly) bleeding-edge 750 GB disks. JB noted that archive batch/Astrogrid-DAS catalogue server thutmose is down, having suffered a rather severe system disk failure despite the system volume in fact being a fault-tolerant RAID configuration of several physical disks. WSA users and AG have been informed of this unavoidable downtime in the availability of SDSS-DR3 etc.; the server will likely be down until early next week. Software: MAR reported: "Altered VOTable output format so that it copes with special (HTML) characters. The mail server switch meant that the WSA and SSA completion emails were not getting sent for a few days this week. After some investigation a stop/start Tomcat solved the problem so it looks like something (DNS resolve) was cached in Tomcat. Investigated why the UKIDSSDR1plus release on thutmose seems slower than on amenhotep. Tests show it doesn't appear to be the underlying hardware but rather how the primary keys in the source tables have ended up being arranged in disk." RSC reported: "Wrote a introduction for the front-page of the on-line software API documentation (http://www.roe.ac.uk/~rsc/wsa/) and improved in general for the review. Investigated the issue of large file support for FITS files on 32-bit PCs, and 64-bit support in Linux software in general. The original version of the FITS I/O libraries and Starlink (Spring 2004 release) as installed on our curation servers do not support large files. The new version of CFITSIO (v3.0+, released 02/06) and the Keoe Starlink release (09/06) provide large FITS file support for 32-bit PCs. However, all FITS I/O libraries appear to load the entire FITS file into memory to perform most useful tasks, and so the memory limit on 32-bit PCs becomes the greatest hurdle. Keoe Starlink has a 64-bit version that has been successfully tested on thor, a 64-bit Linux PC running Open SUSE 10.1. This allows us to overcome the memory hurdle, at least up to the currently available memory limit on 64-bit PCs. I am now investigating Python/PyFITS 64-bit support. My progess in this investigation is being logged on the TWiki at WFAU.LargeFileSupport (in WFAU->WSA->SoftwareNotes). I've also written a couple of TWiki articles in WFAU->WSA->GeneralOps reminding everybody of the procedures for defining new neighbour tables and modifying curation table schemas, which are in WFAU.CreatingNewNeighbourTables and WFAU.ModifyingCurationTableSchemas respectively. The former article would greatly benefit from some input from anybody who knows the full details of adding a new external catalogue to our DBMS." Survey Data Release: NCH has had an email exchange with the UKIDSS GPS survey head and a conflab with SJW concerning cross-talk correction. Apparently, the GPS head is only concerned about cross-talk in 05A data, so there is no requirement (thank the Good Dude) to reprocess all GPS data in 05B and 06A with cross-talk switched off; but there is still some confusion as to how 05A v3 is unacceptable while 05B v1 (which is the same pipeline) is acceptable. Perhaps things will be clearer when all of the 05A v3 data has been checked. Final checks on 05A v3 have to be completed before agreement can be reached on what is in UKIDSS DR2 (see above). Non-survey Data Release: Nothing to report this week. Astrogrid deployment: MSH reported: "Watten is now on the SRIF network, and DSA installation will begin as soon as the latest DSA version is released (which contains support for large file transfers). Once the new DSA is installed srif112 will be wiped and rebuilt to be used for AG code development and data mining purposes." Miscellaneous: NCH noted that there is a NeSC workshp on "The Closed World of DBs Meets the Open World of the Semantic Web" (!) next week on the 12/13. The programme looks largely theoretical and no-one has time to attend on those days anyway, but the visiting speaker has written a few books on, e.g., temporal data and the relational model, which may be of interest in the schema design for synoptic surveys. ETWS noted that he and JB had attended the Digital Curation Centre lecture on preservation of digital data, and to everyone's intense satisfaction, he reported that we are already doing everything that was suggested as "best practice" in the general area of digital curation.