GridPP PMB Meeting 575

GridPP PMB Meeting 575 (28.09.15)
Present: David Britton (Chair), Tony Doyle, Roger Jones, Pete Gronbech, Tony Cass, Gareth Smith, Andrew McNab, Andrew Sansum, Jeremy Coles, Dave Colling, Steve Lloyd, Claire Devereux, Pete Clarke (Minutes Ð Louisa Campbell)

Apologies: Dave Kelsey

1. New Metrics (SL)
SL summarised his suggestions for new and simplified metrics. Main issues are how to weight disk and cpu, and whether to use wall-clock or cpu-time (or both) to account for CPU delivered. DB pointed out that GridPP5 funds would roughly be distributed 43% to Disc and 57% to CPU so a 50/50 split seems appropriate weighting. There was an extended discussion on wall-clock vs CPU-time; about remote-access of data; and about accounting with CLOUD/VAC. In the end, there was general agreement that methods 4 (100% wall-clock) and 5 (100% CPU-time) should be retained Ð 4 should be used with 5 used as a cross-check that sites were not gaming the metrics by buying slow and cheap hardware (to maximise wall-clock time) but got round the difficulties of account just with CPU-time, which might not be fully accurate or available. DB stated the proviso that sites with Monte Carlo should not be used as a benchmark against sites without and if like-for-like sites are underperforming then questions can be raised. It was noted that we could account for disk and cpu separately and at the end correct figures can be inserted (ie 43%/57%) Ð rather than average 50/50. It was reiterated that Disc should be considered GridPP disc, not local. Remote access to disk by smaller sites should not be throttled and Local managers should not be granted artificial priority.

2. Community Meeting (DB and PC)
The Community Meeting took place last Thursday and Friday (24-25 Sept 2015). DB presented on Thursday on GridPP, the status of GridPP5 and the future, PC presented on Friday on how GridPP assists other experiments . Both papers were well received and no major concerns were raised, but the over-running of the programme meant that PC received no questions. It was agreed that this was a good opportunity to present to the STFC staff who were in the room. JoeÕs presentation also went reasonably well. Paul Newman, chair of PPAF, noted the experiments did not have the opportunity to provide sufficient feedback on cutting costs. There were a number of Heads of Groups present at the talks.
DB noted that he had finalised details of ALICE support in GridPP5 in discussions with Sarah Verth. This was effectively 0.5 FTE at Birmingham. In this perspective, he had discussed/finalised the balance of ATLAS effort at Birmingham and Sheffield in discussions with Paul Newman and Davide Costanzo. At Sheffield, the Grid post will become joint GridPP-LZ with the balance changing over the four years.

DB noted that GridPP36 has now moved dates to 11-13 April to accommodate the EGI meeting from 6-8 April as discussed at last weekÕs PMB. CD noted that EGI has now turned into a much larger event than previously planned

ACTION 575.1
DB will circulate a note to all GridPP group members advising of the change in dates for GridPP36 to 11-13th April 2016.

4. Standing Items
SI-0 Bi-Weekly Report from Technical Group

DC reported on the meeting last Friday on the Tier-2 evolution process. AMcN mentioned plans for the members to try things out and this suggestion was well received. Storage was discussed Ð Ian Nelson pointed out security information and will attempt to track this. Networking was also discussed including the dashboard updates. DB enquired about HSF information and AMcN advised the working groups had produced 3 draft technical notes (machine subs process and specs which have proved very useful). The drafts will emerge later this week. It was noted that there was not much to report on August/September as things slowed down after Okinawa, but should speed up more hereafter.

SI-1 Dissemination Report
SL Reports:

###GridPP website 2.0 – ready for comments

The first full iteration of the GridPP Website 2.0 is now ready for comments. It can be found here:

Please take a look and let us know what you think. A few notes:

* Andrew McNab (AM) has successfully deployed the X.509 plugin to the WordPress site, so user accounts can be created automatically and associated with one’s browser-installed certificate. You’ll see this when you access the page for the first time.

* As a temporary measure, the collaboration functionality (i.e. the GridSite old site) will be moved to a certificate protectedÊaccessed via the Collaboration tab (this link won’t work yet, obviously).

* Your attention is drawn to the Case Studies section:

This section will be populated as we go along, but this will give a feeling for the sort of thing we can show potential partners.

* The site has been designed with responsive functionality in mind, i.e. the layout will change depending on whether it’s accessed via a desktop browser, tablet, smartphone, etc.

* The News Items back catalogue will be ported over in due course (and will lead to a few “where are they now?” items, no doubt!).

* Given this is the public-facing website, much of the content has been written accordingly. If anyone has thoughts on where more detail may be more appropriate, please point us in the direction of a relevant publication or document. For example, “Processing LHC data in the UK” was particularly useful for the infrastructure information.

Huge thanks again to AM for setting up the VM and the WordPress site so I could implement the new site design.

Further discussion during the meeting noted that this should be considered a transitional website that will ultimately become front-facing. Acknowledging GridPP at the bottom is crucial for recognising efforts. Further thought requires to be given to transition and suggestions should be fed to Tom. E.g. minutes possibly ordered like blog posts (with dates etc instead of pages), or upload of media connected to blogs and embed into pages without need for html.

ACTION 575.2
All members should consider content and structure for the new website and feed suggestions to Tom.

SI-2 ATLAS Weekly Review and Plans
JR Reports:

Increased discussions on Tier-2 on the same model that Atlas is embracing as they attempt to create unpledged and disc-less sites. Lots of activity has been ongoing around networking areas since a new computer coordinator has been in place.

SI-3 CMS Weekly Review and Plans
No report due to DC leaving the meeting.

SI-4 LHCb Weekly Review and Plans
PC Reports:

A higher than normal transfer failure from RAL – this is not yet on the radar but is being flagged as it may develop into an issue. There is some concern that CBFs are not being propagated correctly and some Tier-1 job fails, but the reasons are not yet clear. This may also develop into an issue over the coming week.

SI-5 Production ManagerÕs report
JC Reports:

Some operational updates from the last week:

1. Following his update to us at GridPP35, Tony Price informed me at the end of last week that he now has jobs running on GridPP nodes for PRaVDA. He wishes to thank everyone who has provided help and support.

2. I have been asked about supporting DEAP3600. This is a dark matter collaboration which includes users at RHUL, RAL and Sussex. I will start discussions unless there are any objections.Ê

3. Oxford suffered from a major air-conditioning failure at the weekend. The GridPP Nagios service was transferred to the backup at Lancaster. Many sites are still in the ÔalarmÕ state on the regional dashboard as the change over has not yet fully filtered through. ÊImperial had a power cut on Friday which affected their network – no issues from users of GridPP DIRAC were reported.

4. WLCG operations now has a website. Articles are starting to appear and may be of general interest:ÊÊ

5. A reminder from Ian Bird to the GDB:ÊAs discussed at the GDB meeting on September 9, I invite you to send me nominations for the next GDB chairman. ÊAs we discussed the intention would be to have the election at the November GDB meeting.

6. Upon request I received clarification on LIGO and LOFAR usage of resources:

LOFAR: “”Using cloud resources to process LOFAR all sky survey imaging data to contribute to the creation of a new deeper low frequency all sky image.Ó
LIGO:”Storing data from the LIGO observatories and analysing it at various GridPP sites for gravitational wave signals using multiple approachesÓ.

SI-6 Tier-1 Manager’s Report
GS reports:

– The second step in the upgrade of the Castor Oracle databases to version took place last Tuesday. This was the upgrade of the “Neptune” standby database and the re-establishment of the Dataguard link. (“Neptune” hosts the Atlas and GEN instance stagers.). The next step in this upgrade is the upgrade of the “Pluto” database which hosts the Nameserver as well as the CMS & LHCb
stager databases. This will require all of Castor to be down for the day and is scheduled for the 6th October.

– Last week I reported that we had seen some packet loss within our Tier1 network. We are making changes to correct some low level issues – currently some switches are configured to be dual connected but are not. These host worker nodes – some of which have been down as we correct this. We will then continue moving connections off the old core switch over the coming weeks.
– On Wednesday (30th September) the link from our main router pair into the RAL core will be upgraded from a resilient pair of 20Gbit connections to a resilient pair of 40Gbit connections. (The site “network at risk” periods are now on Wednesdays rather than Tuesdays).
– Just for the record there was a short break in the primary OPN connection on Friday between 14:07 and 14:14. We did not notice anything operationally but we can see some traffic over the backup link for this short time.

– I reported some problems with the production FTS3 server last week. Since then a workaround to the memory leak introduced with the new version has been supplied. This, along with a reduction in the numbers of transfers queued, has enabled the service to return to normal operation.

Forthcoming Work:
– There is one of the quarterly a UPS/Generator load tests next week (7th October).

The Invitations To Tender for the capacity purchases have not yet gone out. These have been – and are being – worked on.

SI-7 LCG Management Board Report of Issues
DB stated as there has been no MB meeting there is nothing to report.


There will be no PMB on 5 October due to several members being away on other business.

There will be no PMB on 19 October due to several members attending e2e meeting.

Next PMB meeting scheduled for 12 October.

ACTIONS AS OF 28.09.15
571.6 Any PMB members who have not already done so must now submit their quarterly reports. Ongoing.
574.1 AMcN to undertake various tests on Jira and discuss at a future PMB soon. Ongoing.
574.2 On CMS T1 efficiency discrepancies Ð DC reports CMS are running multicore pilots on single core jobs, but Atlas are doing correctly on higher efficiency. Ongoing.
574.8 DB to obtain information from PC about conclusion of MB discussion on Memory Items for the Future and share with PMB members. Ongoing
575.1 DB will circulate a note to all GridPP group members advising of the change in dates for GridPP36 to 11-13th April 2016.
575.2 All members should consider content and structure for the new website and feed suggestions to Tom.