GridPP PMB Meeting 659

GridPP PMB Meeting 659 (05.02.18)
Present: Pete Gronbech (Chair), Dave Britton, Pete Clarke, Jeremy Coles, Tony Doyle, Andrew McNab, Andrew Sansum, Gareth Smith, Louisa Campbell (Minutes).

Apologies: Tony Cass, David Colling, Roger Jones, Dave Kelsey, Steve Lloyd,

1. OSC Documents
Some contributions have now been received, including the Introduction (DB) with some aspects needing updating highlighted in red. These await some input from AS on the Tape plan and staffing ramp down – AS sent a partial draft of text which he will progress this afternoon and will refer to other documents for these specific elements (DB will then update the Introduction).
A draft of Wider Context has been submitted (PC) – this awaits updating with input from AS and Ian Collier and with input from Claire.
GridPP5 status (PG is updating with project reporting and depending on Q4 reports being received – LHCb, CMS, Operations, Security and Tier1 are awaited and need to be submitted asap to be incorporated into the OSC report).
Risk register is almost complete (PG will finish today).
Tier1 status – AS is working on this.
Deployment – JC is working on this and awaits some reports to complete (PG has provided firm timetable).
Atlas – RJ is working on this.
CMS – DC is working on this.
LHCb – AM is working on diagrams (almost complete).

Other VOs – JC is working on this.

Other documents (financial – PG is working on – spreadsheet and financial document have previously not been formally submitted but prepared in case required).

DB requested Q4 reports to be submitted asap.

2. GridPP40 Agenda Items
PG has run through the minutes of the Technical meeting and pasted the link. He has placed some suggested topics into timeslots:

Day 1 – starts at 9am Tuesday 9 April
Session 1 – GridPP5 Status (PG to chair) – Status update and plans (DB) and Technical implications of UKT0 (AM & PC to chair), expectations need to be managed here on funding being sought etc (we are very supportive and the benefit is it would reduce pressure on non-LHC users, but does not replace anything we are already funded to do).
Session 2 – Accounting inefficiency (chair TBC) – Adrian Coveney at RAL volunteered to speak; DC will cover CMS efficiencies and there is space for one more talk.
Session 3 – Storage (AM and DC to chair) – PG is awaiting information on what could go in there (Jens and HEPSYSMAN meeting covered several interesting topics that would work well in this section).
Session 4 – Network and Security (DK to chair) – LHC1 experience from a couple of places, eg LHC and RAL, but there are no confirmed speakers as yet. David Crooks has suggested several talks in the Security section so this session is shaping up well.
Day 2 is developing.
Session 5 – Operations (JC to chair) – Upgrading to SL7 and open to suggestions for the remaining talks. There are no reports yet from LHC experiments.
Session 6 – (RJ to chair) – open to suggestions.
RJ, DC and AS should confirm if there should be elements covered here, perhaps even status report. PG will email them.
PG and DB asked for suggestions for other topics to be included.

3. Tier-1 Resource underspend & staffing
AS summarised – recently asked by Finance team to make a case to carry forward some underspent GridPP resource at RAL and this would need to be highlighted at the OSC meeting. After discussion with DK it was felt a case and justification could be made to request that we carry forward 1.5 FTE from FY17 to FY18. The case would rest on the challenges recruiting in this Financial Year with 3.0FTE down and because of this shortage of effort some activities planned for have not progressed. DK is making the case to the PPD finance team who will then present it to Tony Medland early this week. AS and DK will discuss further and AS will update the Tier1 section of the OSC documents regarding this request and how we plan to use this if successful.


5. Standing Items

SI-0 Bi-Weekly Report from Technical Group (DC)
DC not attending and no report submitted.

SI-1 ATLAS Weekly Review and Plans (RJ)
RJ not attending and no report submitted.

SI-2 CMS Weekly Review and Plans (DC)
DC not attending and no report submitted.

SI-3 LHCb Weekly Review and Plans (PC)
Nothing significant to report.

SI-4 Production Manager’s report (JC)
Nothing for discussion but some items for information:

1. The security team remains busy chasing up patches at sites. EGI are now ticketing sites for Meltdown/Spectre issues (latest information is in )

2. Steve’s & Duncan’s view on actual GridPP CPU/storage IPv6 enablement is captured in (green cell under capacities imply fully enabled). The underlying situation is more positive with all but 3 sites now having IPv6 addresses as recorded in .

3. Sites have 1 month to upgrade their Argus versions to (UMD 4.6) one that fixes Policy Administration Point permission problems.

4. There is an ongoing UK discussion about the value of ROD tickets and the underlying EGI monitoring tests. The value of the latter have decreased over time, but the need for active ticketing in a time of reduced manpower remains.

SI-5 Tier-1 Manager’s Report (GS)
A brief report covering the last week.

• Atlas have moved their FTS transfers from our “test” FTS service to the Production one. This change was made (on 30th January) at our request and effectively consolidates us with a single production FTS service. However, there were problems over the weekend when disk space on the FTS servers ran out. Atlas submitted a GGUS alarm ticket yesterday (Sunday). The problem was resolved by the on-call team.
• I forgot to mention at last week’s report: During the week prior There was a problem with the system that runs the tape library control software overnight Wed/Thu (24/25 Jan). Staff were called late Wednesday evening but were unable to get the system up then. Overnight we were unable to mount tapes – effectively blocking tape access (although writes to the disk buffers in front of tape, plus reads of any data in those buffers, carried on). The fault on the server was resolved Thursday morning and normal tape service resumed.
• We have noticed disproportionately large numbers of Alice jobs running on the batch farm – blocking out other VOs. Steps have been taken to correct this.
• Alice, CMS and LHCb VOBOXes are now dual stack IPv4/6.
• The Hyper-K VO has been enabled on the batch farm.
• We await further updates regarding an ongoing problem with one of the BMS (Building Management Systems) in the R89 machine room. This has an intermittent fault.

DB enquired if we could have been quicker to pick up on running out of space on FTS disk server and also plan ahead for it. GS confirmed this will be further discussed and was caused by the logging level being set too high due to putting ATLAS onto this.

SI-6 LCG Management Board Report of Issues (DB)
There was no meeting and nothing to report.

SI-7 External Contexts (PC)
DC and PC were involved in BAES document writing – this is in good shape and will soon be submitted. If successful this will unlock £16M which will be extremely useful for wider aspirations.

CMS meeting coming up – DB may be able to catch up with Anthony and accompany him to the EUT0 meeting after the HN SyCloud meeting which PC will join via video. Tony has stepped down as Chair – Volka Beckmann is now Chair and we continue to contribute to ensure STFC has a presence.

PC summarised the balance of programmes computing review panel (discussed last week) and panel members names were public, with 3 Astronomers and 2 Dirac people but no Particle Physics scientists (AS is there as an STFC person and inappropriate for him to be there to represent all sectors). AM is now on the panel and Sinead Farringdon to represent Particle Physics. DB has suggested we prepare a briefing note to highlight GridPP’s contribution over the last 17 years – he will report to the PMB once this has been more worked up.

644.3: AS put together a starting plan for staff ramp-down. (Update: a draft will be produced in January). Ongoing.
644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY?) Ongoing.
OS documents MUST be done and submitted to PG this week.
649.2: PC will write Wider Context of OS documents. Ongoing.
649.4: GS and AS will write the Tier1 Status section of OS documents. Ongoing.
649.5: JC will write Deployment Status section of OS documents with input from PG. Ongoing.
649.6: RJ, DC and AM will write LHC section of User Reports in OS documents. Ongoing.
649.7: JC will write Other Experiments section of User Reports in OS documents with input from DC and PG. Ongoing.
655.2: AS to prepare a report on failure of the generator to come up after a recent issue. Done.
655.3: PG to consider the agenda and date for Tier1 review and include disaster recovery plans. (UPDATE: appropriate dates are being considered with AS). Ongoing.
656.1: DK will report before the end of February on any actions GridPP should take to comply with GDPR. Ongoing.
656.2: DC will report on CPU efficiencies. Ongoing.
656.4: DB will contact external contacts to invite them to attend and/or contribute to GridPP40. Done.
657.2: DC to report on the CMS taskforce. Ongoing.
657.3 PG will provide PC with documents and diagrams relating to the management structure. Done.
658.1: AS will discuss with CERN and the Netherlands participants if and where we may contribute to Data Lakes. (Update: Alistair is taking this forward with CERN and AS has arranged with Jens for the storage team support this activity – concluded we can meet requirements by providing Cloud VMs with storage combined with a small amount of % FTE). Done.
658.2: DB will discuss with PC and Tony Medland then draft an email raising concerns about the lack of Particle Physicists on the Balance of Programme Computing Review panel. Done.
658.3: AS and GS will update their OSC document sections and specifically address the actions raised by the OSC last time. Done.

ACTIONS AS OF 05.02.18
644.3: AS put together a starting plan for staff ramp-down. (Update: a draft will be produced in January). Ongoing.
644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY?) Ongoing.
OS documents MUST be done and submitted to PG this week.
649.2: PC will write Wider Context of OS documents. Ongoing.
649.4: GS and AS will write the Tier1 Status section of OS documents. Ongoing.
649.5: JC will write Deployment Status section of OS documents with input from PG. Ongoing.
649.6: RJ, DC and AM will write LHC section of User Reports in OS documents. Ongoing.
649.7: JC will write Other Experiments section of User Reports in OS documents with input from DC and PG. Ongoing.
655.3: PG to consider the agenda and date for Tier1 review and include disaster recovery plans. (UPDATE: appropriate dates are being considered with AS). Ongoing.
656.1: DK will report before the end of February on any actions GridPP should take to comply with GDPR. Ongoing.
656.2: DC will report on CPU efficiencies. Ongoing.
657.2: DC to report on the CMS taskforce. Ongoing.