GridPP PMB Meeting 705

GridPP PMB Meeting 705 (15.04.19)
Present: Dave Britton (Chair), Pete Clarke, David Colling, Alastair Dewhurst, Tony Doyle, Pete Gronbech, Steve Lloyd, Gareth Roy, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Tony Cass, Jon Hays, Roger Jones, Dave Kelsey, Andrew McNab.

1) GridPP6 Response to the Panel
DB has been advised the deadline for submission of OSC documents has changed to 2 May and the Chair of the GridPP6 panel has confirmed the response document can be delayed by a few days if absolutely necessary.

Q1 on using new technology and cost effective methods – AD is completing relevant sections flagged in red for v2 and will then circulate to the rest of PMB for comment.

Q2 financial savings made through GridPP5 and anticipated for GridPP6 – PMB noted two potential interpretations of this and DB clarified with the panel Chair. DB circulated a draft response which now needs input from others, e.g. AD will address then PC (he has already contributed on Ligo and Logo). There was some discussion on aspects to cover.

Q3 on 25% more tier2 HW for Atlas and whether it can be refunded. DB drafted near-final version this morning and circulated for comment from others.

Q4 on reduction of Tier2 capability. DB circulated a draft which needs input from RJ or DC to provide more robust response.

Q5 scientific justification for requesting FEC for each post. As discussed last week – this is interpreted as a global justification rather than post-by-post basis. Some text in the original draft from PC will be integrated. Perhaps a list of publications, contributions to working groups etc. should be drawn together to support this response.

Q6 Rationale for removing capability rather than travel or academic overheads. A response will be drafted based on DKs text that DB will then circulate.

Q7 division between Tier1 and Tier2 not clearly explained. AD will circulate his proposed response and DB will edit as necessary.

Q8 not explained how request would meet UK pledges at Tier1 and Tier2. A clear statement is needed on meeting pledges and how compute required relates to the request – tables make very clear how this operates so we could draw this out clearly in the answer. Members noted that the second part of the question is not entirely clear.

Q9 electricity costs. PC has drafted a response that needs to be further developed – it is challenging to provide an accurate model. DC can provide costs for Slough as a starting point. GR could better estimate this relating to the power draws on site using Glasgow as an example (he has recent figures provided by the director of IT).

ACTION 705.1: RJ and DC will work up text for Q4 of the panel response.

2) Q4/18 reports outstanding (11/4)

  1. CMS – DC emailed this morning and is now urgently progressing.

1. Tier 1 – AD is dealing with this urgently.

3) Oversight Committee inputs
PG circulated the documents, DB has added some plots and text – it is now ready for contributions from other members.

  • Introduction – DB
  • Status
    • Wider context – PC will update slightly.
    • GridPP6 proposal is fine
    • PG will update figures to cover the next 2 quarters
  • Risk Register – content for GridPP6 version is in place and GR will review and amend for GridPP5 version then circulate for discussion at next week’s PMB.
  • Tier-1 Status – AD is working on and has asked for input from others.
  • Deployment Status – JC traditionally undertook this section and it will need to be considered going forward. Matt and Kashif have been running the weekly meetings in JC’s absence. Formal discussions should be undertaken regarding replacement for JC’s role and how to deal with this section in the OSC reports going forward. This section brings together various issues so its relevance needs to be reconsidered – main points of operations or issues flagged at PMBs that ultimately require action could be included (e.g. AD and RJ).
  • User Reports (ATLAS, CMS, LHCb, Other) – RJ, DC and AM will contribute to this section. JC previously undertook Other VOs, JH could undertake this under discussion with Duncan who provides input (PC will also discuss with JH).
  • Outreach – SL completes this section.

ACTION 705.2: DB and PG will consider how to deal with the Deployment Status section of the OSC documents.

4) Technopolis
PC has not had a response from his latest email and has sent back to Charlotte. He is satisfied with the text but welcomes contributions from others. Charlotte Glass is keen on taking forward industrial engagement and invited any members to contact her in this regard (perhaps someone from Tier-1. AS and AD will discuss).


  1. AD mentioned XMA have been positive about sponsorship but have not yet progressed. He is making enquiries if STFC can pay for the meal and requested £575 for alcohol (1 large glass per delegate). Everything else is in hand.

Agenda for GridPP42 – there has been a couple of minor alterations. AD can make arrangements to get people from the station, including group transport noted on a googledoc and booked through STFC projects.

6) Standing Items

SI-0 Bi-Weekly Report from Technical Group (DC)
Meeting on Friday regarding the future following Sam’s thoughts on the DPM community suggesting DPM development would not be keeping up. There was a presentation from AD on CEPH and how it could be used for Grid storage, Sam also spoke about Glasgow’s plan for the machine room and testing. Mark Slater also spoke about EOS and XCash. There was agreement for 3 further meetings – Xcash, Glasgow experience in setting up and one other.

SI-1 ATLAS Weekly Review and Plans (RJ)
Not in attendance. No report submitted.

SI-2 CMS Weekly Review and Plans (DC)
Not in attendance. No report submitted.

SI-3 LHCb Weekly Review and Plans (PC)
Not in attendance. No report submitted.

SI-4 Production Manager’s report (JC)
No report submitted.

SI-5 Tier-1 Manager’s Report (AD)
– We are seeing high outbound packet loss over IPv6. Investigations on hold as central networking do not have the expertise (Philip Garrad) available until after Easter.

– High CMS job failure rates. Ongoing issues with meta-data spread across large files. Temporarily limited CMS job slots.

– On Friday 5th April, gdss700 (LHCb) had a double drive failure and needed to be removed from production. Further problems were found, and the while we were able to return the disk to production briefly we were unable to copy all the files off and 1482 were lost.

– On Wednesday 10th April, gdss811 (LHCb) had a failure of the disk running the operating system. This generation of hardware has OS disks that are very inconveniently located (glued to the underside of the motherboard!). Not yet returned to production as of the morning of the 15th.

– On Thursday 11th April, unknown issue caused a significant fraction of docker containers (running jobs) to restart.

SI-6 LCG Management Board Report of Issues (DB)
No management board meeting last week.

SI-7 External Contexts (PC)
F2F IRIS meeting – PC noted it was a very successful and positive meeting. Charlotte attended and suggested that IRIS was “one of our successes” in a very supportive presentation. She also gave Susan Morrell’s talk on infrastructure. AM spoke about how we converge the operations process, ie operational framework for IRIS. There will be further discussions in the coming weeks relating to how joining up GridPP and IRIS operations, commencing around the security framework. This was a general discussion session, not a detailed decision-making one. It would be useful to have David Crooks come along and talk about security. One more discussion relevant to AII and WLCG were mentioned, the AII is very important re WLCG and UKRI activities and needs discussion. PC is working on clarity for HW requirements for IRIS in 2019 that will be discussed at the delivery board. There were several GridPP team members in attendance. Daniella spoke about Dirac, AM spoke about Rucio – AS and Duncan may arrange a data management workshop organised soon.


644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.

702.1: DC to identify an LZ presentation for GridPP42. Ongoing.

702.3: GR to Update is required to Table 20 to bring numbers in line with the returned JeS forms. Ongoing.

702.5: PC to draft a set of milestones for WP4. Ongoing.

702.6: PC & DB to add some additional text to bring things together (WP1c). Ongoing.

703.1: DC to provide figures for WP2 numbers. Ongoing.

703.2: AD will contact Darren (Tier-1), Tim (Atlas) and Katie (CMS) for Q4 reports. Ongoing.

704.1: ALL should discuss with experiment reps to develop response for question 1 of the GridPP6 Response to the Panel.

704.2: AD to develop a response relating to WP4 work (question 1) of the GridPP6 Response to the Panel.

704.3: DB, RJ, AD and DC will work on question 3 of the GridPP6 Response to the Panel.

704.4: ALL should review and contribute to the GridPP6 Response to the Panel where appropriate.

704.5: DB will write a report for the OSC.

ACTIONS AS OF 15.04.19
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.

702.1: DC to identify an LZ presentation for GridPP42. Ongoing.

702.3: GR to Update is required to Table 20 to bring numbers in line with the returned JeS forms. Ongoing.

702.5: PC to draft a set of milestones for WP4. Ongoing.

702.6: PC & DB to add some additional text to bring things together (WP1c). Ongoing.

703.1: DC to provide figures for WP2 numbers. Ongoing.

703.2: AD will contact Darren (Tier-1), Tim (Atlas) and Katie (CMS) for Q4 reports. Ongoing.

704.1: ALL should discuss with experiment reps to develop response for question 1 of the GridPP6 Response to the Panel.

704.2: AD to develop a response relating to WP4 work (question 1) of the GridPP6 Response to the Panel.

704.3: DB, RJ, AD and DC will work on question 3 of the GridPP6 Response to the Panel.

704.4: ALL should review and contribute to the GridPP6 Response to the Panel where appropriate.

704.5: DB will write a report for the OSC.

705.1: RJ and DC will work up text for Q4 of the panel response.

705.2: DB and PG will consider how to deal with the Deployment Status section of the OSC documents.