From nch@roe.ac.uk Mon Oct 2 16:47:00 2006 Date: Mon, 2 Oct 2006 14:28:20 +0100 (BST) From: Nigel Hambly To: WFCAM Science Archive Team -- Eckhard Sutorius , Johann Bryant , Mike Read , Nigel Hambly , Nicholas Cross , Bob Mann , Ross Collins Cc: CCs for WSA weekly meeting minutes distribution -- Andrew Lawrence , Andy Adamson , John Taylor , Jim Emerson , Malcolm Stewart , Mike Irwin , Mark Holliman , Peredur Williams , Stephen Warren Subject: WFAU VDFS Science Archive weekly meeting minutes, 29 Sep 2006 Minutes of WFAU VDFS Science Archive meeting: 29th September 2006 ------------------------------------------------------------------------- ------------------------------------------------------------------------- Present: NCH, RSC, MAR, JB, ETWS, MSH, NJC, PMW Apologies: JPE, AL, JMS, RGM, JDT DONM: 10am, Friday 6th October 2006 in the Plate Library Actions discharged this week: ----------------------------- ACTION: NCH to include discussion of VISTA PSP requests for enhanced VDFS archive functionality in VDF-WFA-VSA-002 Discharged. ACTION: NCH to remember to ask PMW to revoke the rolling Vista Hut meeting room booking for these meetings. Discharged. ACTION: MSH to liaise with IT Support over obtaining another allocation of IP addresses for SRIF networked machines. Discharged; helpdesk ticket in so the ball is in IT Support's court ACTION: NCH to try to put something into the DB design document concerning this. Discharged (by NJC!); a note has been put in the current database design document along the lines of the suggestion. NCH: VDF-WFA-VSA-002 Science Requirements Analysis PMW: " " " 003 Management and Planning NCH: " " " 004 Interface Control JB: " " " 006 Hardware/OS/DBMS Design NJC: " " " 007 Database Design MAR: " " " 008 User Interface RSC: " " " 009 Software Architecture Design, incl. local data flow RGM: " " " 010 VO Enablement ACTION: ALL to draft their respective documents (see above) by the 1st September. Discharged; docs released after this meeting (actually on the 30th at midday, owing to some last minute hitches...). Well done everybody. Actions partly discharged but continuing: ----------------------------------------- The following from last week partly done but continue: m) rearrange external catalogues onto batch catalogue server - ACTION: JB p) Release calibration and transit survey data as non-survey databases - ACTION: MAR, NCH & JB Continues; some final issues related to the size of the (temporary) neighbour tables. Actions carried forward from 29/09/06 meeting: ---------------------------------------------- The following from last week continue: d) Debug CU7 for deprecated frame sets and deletions on rerun - ACTION: NCH i) Enhance seaming to use quality bit information - ACTION: RSC (on hold until after h) k) Ensure ingest code can cope with 06A missing data quirks and any associated new attributes - ACTION: ETWS o) Add in a default row for every detector appearing in every detection table (for schema consistency when querying merged sources and individual detections) - ACTION: RSC & NCH - all are on hold until after the documentation for the review has been prepared. ACTION: RGM to email Ken Rice (IfA contact concerning IBM P690s) about potential WFAU interest in high-end freebies from IBM. - CONTINUES ACTION: NCH to circulate a release schedule for DR2 to all concerned. - CONTINUES ACTION: JB to make a full WSA backup around mid-September. - CONTINUES Specific points and new actions: -------------------------------- Project management: PMW noted the VDMT telecon on the afternoon of the 3rd October at 2pm. The team filled in the usual progress charts during the meeting; PMW will write the usual narrative and distribute to VDMT prior to the telecon. NCH noted that a UKIDSS meeting has been scheduled for the 13th October in Nottingham; the major (in fact, only?) agenda item will be the renewal cases for UKIDSS continuation after the 2 year plan. NCH will be attending by telecon. The team again discussed who should attend the VDFS UK FDR in Cambridge on the 24/25 October, finally agreeing that PMW, NCH, RSC will definitely be there with the possibility of MAR should the panel feedback questions on the user interface side of things. WFCAM update: Nothing new to report this week. Comments and issues arising from CASU fortnightly minutes: The team noted the minutes of the meeting of 19th Sept. There were several points to note: 1) RGMcM expressed concern over the time-lag between transfer of data released as ok-to-copy and ingest so that flat-file access is available for a given night's data. At an ad hoc face-to-face at IoA at the beginning of the week with NCH, the same issue was raised in what can only be described as a full and frank exchange of views during which RGMcM all but accused WFAU of incompetence, claiming that we have overly complicated a simple task. To explain things from the WFAU perspective, let's break the task down into (a) transfer (measured by the time-lag between the ok-to-copy and successfully-copied timestamps) and (b) ingest into the archive: a) Transfer: in principle, WFAU agrees that there is no reason why, under conditions of steady state and full automation, the time-lag cannot achieve the goal of being of order a day. RGMcM has plotted the time-lag figures for all WFCAM nights processed, and we note that there is a rapid decline in the time-lag over the past year; moreover the most recent transfers are logged on the CASU pages as transfered within one week. So the overall trend is going in the right direction, and we are confident we can continue to reduce this worrying timelag. In our defence, we would like to point out that the dataflow system is proving difficult to automate because of the need for constant manual intervention when things go wrong; in the case of transfers, we are proceeding carefully since the overriding concern is in producing survey data products of the highest quality. As soon as there is any hint of a bug anywhere in the system we tend to suspend transfer ops and investigate in great detail to make sure we understand where the problem is before continuing. This has prevented a lot of problems in the past and helps to keep rereretransfer of large amounts of data to a minimum for CASU. b) Ingest: we remind everyone that before flat-file access can become active, the processed data metadata need to be ingested into the database for the web applications to be able to query what's available and present this to users. Once again, in principle and assuming steady state and full automation with minimal manual intervention, there is no reason why the timelag between successfully-copied and data being activated for flat-file access cannot be down to a few days. The problem over the past year has been reprocessing. We ingest data in observation order, so 05A v3 is going in now and 06A v1 is on hold until completion of ingest of the former (this is required by UKIDSS for example, because they are checking 05A v3 to make sure it addresses the concerns over processing v2; please note that ingestion of 06A will not "take place in December" - rather it will start as soon as 05A v3 has gone in and is expected to be complete in December provided no new problems arise). Furthermore, broken files requiring manual intervention are again being investigated fully to ensure that all bugs are caught and that data quality is maintained. Several times in the past, we have saved a lot of time by proceeding in this cautious manner. WFAU are in no way criticising CASU for anything in any of the above; we recognise the difficulties inherent in automating everything in this situation, and we appreciate CASU's efforts in creating ever more robust solutions to processing wrinkles that, let's face it, affect a (decreasingly) tiny proportion of the data. It is inevitable, however, that the majority of time is spent chasing and fixing the minority of data near the instigation of the project, and we would appreciate acknowledgement of this and perhaps a little more respect from a certain individual down south. See below for items concerning UKLight and UKIDSS checking of v3 data. Networking: JB has heard nothing from the UKLighters as to what's going on, and will nudge them to find out. Download speed for 06A is currently hovering around the (marginal) rate of around 6 MB/s, despite using a newly optimised, reduced thread number hyper-scp. WSA Operations: JB has raised a concern over 05A v3 files apparently changing after being released as ok-to-copy, transfered and flagged as successfully copied, and subsequently moved. Maybe this is innocuous, but with reference to the previous items on efficiency in the CASU-WFAU interface, we would like to know asap if there is nothing to worry about so that we can carry on with confidence (i.e. please can CASU answer email enquiries asap; we appreciate, however, that the end of last week was a little frantic, what with documentation deadlines etc.) The question of ingest DB backups was again discussed, during which NCH expressed sympthy over scheduling these during normal operations, but reiterated that disaster recovery is a concern. NCH asked JB to get a full backup asap, and then experiment with a differential backup a week or two later to see what the operational issues are from such real-world experiments. Hardware: NCH noted that the archive ingest server ahmose is performing quite badly at the moment for interactive use; this needs to be investigated and the situation rectified asap in case it is a symptom of a wider malaise. Software: MAR reported: "Started working on the re-organisation and re-packaging of Java code into a better defined structure. Dealt with several user requests. One of which wanted to be able to view UKIDSS images from TOPCAT via activation. Managed to get a working solution via appending a column to the SQL query. It's a bit of a kludge but might be useful." NJC reported: "I have mainly been working on documentation, but I have also been working on reCalibration, which is almost complete - it now checks through all the multiframes for a particular programme and compares the zeropoints in the Multiframe and MuliframeDetection to the FITS files. If there are differences it updates thema nd moves the old ZPs to PreviousMFZP and PreviousMFDZP. Then it updates the detection table and source table. I still need to do the source table colours and nightly ZP. I have also edited CU 13 and 14 to handle stack files as well as leavstack files." RSC reported: "Completed the SAD, filling in the gaps with the help of comments from ETWS and NJC. Performed first-stage upgrade of CU19 to the OO framework design (it now makes use of the CuSession class features), which is tested and working. Full upgrade path is outlined in Misc.RossDocumentation on TWiki. Started developing a method for testing CU19, which I'm documenting on the WFAU.SoftwareTesting TWiki page. Various other improvements to the Python code, and following a very successful extreme programming session with NJC (to thrash out the design of some new Utilities.py algorithms), spotted and fixed a bug in one of the pre-existing algorithms. Completed on-line software API documentation for the Python-embedded C++ modules, obsoleting xheader module along the way." Survey Data Release: NCH noted that UKIDSS have checked April 05A processing v3, with the only concerns coming from the GPS over the cross-talk corrections going bananas in fields of high, bright stellar density. This was discussed with MJI earlier in the week, where all agreed that it was strange since no such worries had been expressed for DR1 05B v1 data (where the crosstalk correction was as for 05A v3). The GPS head has actually floated the idea of retaining processed v2 for the UKIDSS DR2; does this mean there will be a (frankly insane) requirement to reprocess GPS 05B v1 with the crosstalk correction switched off, and similarly switch it off for current processing of 06A GPS data, reprocessing what has been done to date?! NCH will attempt to clarify as part of his action to pin down what UKIDSS DR2 will actually contain and what the schedule for this will be (see continuing actions above). Non-survey Data Release: Nothing new this week, other than we note that enquiries as to the availability of 06A PI data are reaching a small crescendo prior to the current round of telescope application deadlines, but that the priority of ingesting 05A v3 unfortunately means a few non-survey PIs are necessarily disappointed ... Astrogrid deployment: MSH reported: "Here's the page covering the current setup of AstroGrid at ROE - http://apache.roe.ac.uk/twiki/bin/view/WFAU/AstroGridAtRoe It has a description of what is on each server, and a diagram that details that information as well. AstroGrid setup at ROE is nearly complete, and will utilize four servers to provide a complete set of AstroGrid services. Two of the servers will house the main AG components such as registries, community, JES, and a file store/manager, and the other two servers will be used for providing DSA access to WFAU's databases. One of the DSA servers will be used for production level services while the other will be used for code development, service testing, and data mining." Miscellaneous: Nothing else this week.