GridPP PMB Meeting 632

GridPP PMB Meeting 632 (08/05/17)
Present: Dave Britton(Chair), Pete Clarke, David Colling, Tony Doyle, Pete Gronbech, Roger Jones, Steve Lloyd, Andrew Sansum, Gareth Smith, Louisa Campbell (Minutes).

Apologies: Tony Cass, Jeremy Coles, Dave Kelsey, Andrew McNab.

1. Quarterly Reports
Relevant PMB members have received an email from PG advising the reports need to be submitted within the next few days in order to prepare the OC docs.

2. OC Docs
PG circulated last year’s report with sections marked appropriately for PMB members to complete:

DB will write the introduction.
PC will write Wider Context (this is not likely to have a great deal of content at the present time, though Ian Collier is due to speak on this matter very soon. It is likely to concentrate on positive aspects such as the LSST DESC meeting on 16 May. An open invitation was extended to the PMB – Peter Love and Alessandra will attend. This an opportunity to highlight stresses to provide resources).
PG will write the GridPP5 Status from Quarterly Reports.
PG will write the Risk Register.
AS will write the Tier-1 section (AS updated his recent email summarising progress on CEPH and issues around tape hw; the Castor situation will be updated and challenges surrounding estimate pricing for hw that should be flagged up to the OC to comment).
JC will write the Deployment Status section.
Experiment reports – RJ will write the ATLAS section.
DC will write the CMS section.
AM will write the LHCb section.
DC and JC will write the Other VO section.
SL will write the Impact & Dissemination section (SL has inserted a section on Tom Whyntie’s resignation and other paragraphs).

ACTION 632.1: DB will work on the Introduction of OC doc.
ACTION 632.2: DB and PC will work on Wider Context section of OC doc.
ACTION 632.3: PG will work on PI5 status and Risk Register of OC doc.
ACTION 632.4: AS will work on Tier-1 section of OC doc.
ACTION 632.5: JC will work on Deployment Status of OC doc.
ACTION 632.6: RJ will work on ATLAS section of OC doc.
ACTION 632.7: DC will work on CMS section of OC doc.
ACTION 632.8: AM will work on p LHCb section of OC doc.
ACTION 632.9: JC and DC will work on Other VOs section of OC doc.
ACTION 632.10: SL will work on Impact and Dissemination section of OC doc.

a) PC advised of a planned Janet meeting in London on 15 June which would useful to have input to. This is in response to RJ’s communications with David Salmon – though the date is not ideal as RJ and most ATLAS people are out of the country. DB & PG may attend as they will be in London for the OC meeting.
b) Experiment efficiencies at RAL are low – GS will investigate and report back to the PMB next week. 67% for ALICE and ATLAS, CMS and others are around 10% below normal.
ACTION 632.11: GS will report next week on low efficiencies experienced at RAL in April.

4. Standing Items

SI-0 Bi-Weekly Report from Technical Group (DC)
No meeting last week and nothing to report.

SI-1 Dissemination Report (SL)
SL reported that an advert for Tom Whyntie’s replacement is now in place – HR has in the first instance advertised internally and this will be advertised externally very soon.

SI-2 ATLAS Weekly Review and Plans (RJ)
Nothing of significance to report.

SI-3 CMS Weekly Review and Plans (DC)
Nothing of significance to report.

SI-4 LHCb Weekly Review and Plans (PC)
Nothing of significance to report.

SI-5 Production Manager’s report (JC)
Not attending – no report submitted.

SI-6 Tier-1 Manager’s Report (GS)
• A week ago on Friday (28th April) there was a problem with the UPS for building R89. The UPS switched itself into “bypass” mode – which effectively means we have no UPS. We have run in this way since then. The cause was overheating of internal capacitors which failed. At the moment costs are being gathered for a way forward. In the current situation the diesel generator cannot be used either. DB reiterated that issues surrounding UPS are of concern and will be monitored.
• The replacement chillers are working OK. The reduction in power brought about by their replacement means their costs are now expected to be recouped in around 5 years.

• The Castor team have pushed ahead preparing to upgrade to Castor 2.1.16 as a way out of our current problems. This will put us on the same Castor version as CERN are using. The plan is to update the central components (the nameserver) and the LHCb stager tomorrow (Tuesday 9th May) – although we await final results from stress tests ahead of that. If all goes well the Atlas and CMS stagers will be upgraded on Thursday (11th).
• There has been a particularly severe problem with one of the Castor disk servers (GDSS818) for LHCb. It has failed twice in the last week. The first time the RAID card was replaced. After the second failure staff attended on site yesterday (Sunday) afternoon to recover the RAID array. There were some further problems but the server was put back in service read-only shortly before this meeting.

• We had a problem with the argus server Tuesday to Wednesday last week. This affected CMS (production and tests) for around 20 hours. Work is going on to put the argus servers behind load balancers which would reduce the likelihood of failures due to problems with argus.

SI-7 LCG Management Board Report of Issues (DB)
The last meeting was cancelled. The next meeting is scheduled for Tuesday 16 May 2017.

SI-8 External Contexts (PC)
Nothing of significance to report.

NEW ACTION: LC Will check 16-18th April for GridPP40 at Durham (beginning of the week – CHEP is later in the week and wait to see when IOP). Ongoing.

630.1: AS and PG will commence planning and modelling for OSC documents and couple to plans and decisions on Tier-2 funding (2019-20). Done.

630.2: DB and PG will continue to work on metrics and funding strategies at the macro level. Ongoing.

630.3: DB will tweak his metrics and funding model based on CPU. Ongoing.

631.1: PG will create a summary spreadsheet of the 2016 Experiment Review figures to extract important figures for the OC. Ongoing.
631.2: ALL to work on OC documents for submission by end May. Ongoing.

631.3: DB will announce GridPP39 on UPHEPGRID. Done.

