GridPP PMB Meeting 818

Present: David Britton (Chair), Peter Clarke, David Colling, Davide Costanzo, Alistair Dewhurst, Tony Doyle, Katy Ellis, Peter Gronbech, Jonathan Hays, Roger Jones, David Kelsey, Steve Lloyd, Andrew McNab, Sam Skipsey (Minutes), Jill Sambrook (Minutes)

Apologies: Tony Cass, Andrew Sansum

Items

  1. OSC Meeting update [DB]

The documents for the OSC were submitted on Thursday 6th May. The OSC had got in touch to move the submission date by a few days, so everything was completed and submitted on time. Thanks to all for their help and providing the information necessary to complete the report.

The plan for the OSC meeting will be to have a slightly shorter presentation than normal, but then there will be more time for questions and to address previous actions.

Dave may be in touch with a few individuals shortly for some presentation slides.

  • Possible network meeting [PC]

The group agreed to have an initial discussion at the Operations meeting on the 24th of May to help prepare for a network meeting later in the summer.

RJ and JH said they could potentially get information from their networking team (could also provide some data themselves) and AM could provide Glasgow stats. All groups agreed to provide traffic flow plots from their own internal monitoring prior to the meeting on the 24th. It was agreed they would gather information for the Ops meeting to help frame a meeting for later in the summer.

SS and PC to talk to Matt D and sketch out the agenda for larger meeting over the summer

AOCB:

Future Meetings, proposal:

May 9, 16, 30

June 6, 13, 27

July 11, 25

Aug 8, 22 and F2F Aug 30th in Ambleside

Dave/Sam/Jill will be in touch with RJ soon to start discussion and plans around F2F meeting in Ambleside

AM raised a query around the Procedure for producing the minutes of the PMB.

DB confirmed this issue should hopefully now be resolved. JS and SS will produce and approve the minutes and, from now on, these should appear on the CERN site on the Monday morning when the agenda is circulated.

Standing Items

SI-0 Tier-1 Manager’s weekly report & Technical Meetings [AD]

Report provided

– Technical Meetings

None last week.  Network meeting planned but needs to happen on Tuesday if Pete Clarke is to attend.  We need to arrange for Rucio/Dirac meetings to restart.  I am arranging a pre-GDB in June on Kubernetes.

– Tier-1 Operations

RAL Tier-1 joined the LHCONE at 16:00 on Wednesday 4th May.  Currently it is a single node and we are checking the routing from other sites (in particular other Tier-1s).  It is now much easier to add more nodes.  We are making sure our new IPv6 allocation is correctly configured before we add more.

Antares is continuing to work well:

  • A major upgrade is planned for the 25th May which should resolve some known issues (FTS transfer bug and MGM “hanging”).
  • We are engaging with non-LHC VOs such as Dune, T2K and SNO+ to ensure they can use the instance properly.
  • Continued testing from VO liaisons is helping us optimise performance, in particular recalls.

ATLAS and LHCb have reported issues deleting files from Echo.  They appear to be very different issues.  ATLAS are seeing an acceptable deletion rate with occasional spikes in failures, probably as a result of load on the gateways.  LHCb are seeing an average deletion rate of 1 file every 17 seconds, which is clearly not sufficient and likely a result of something mis-configured / broken.  

In the last few weeks the WebDAV servers have been stable and test are looking good.  There are a variety of tickets around WebDAV that are slowly being resolved.  Failed transfers in the last week have primarily been a result of problems at other sites (Prague and MPPMU)

We have some test results from running the Vector-read code in production.  We should have some more details later this week hopefully at Wednesday’s Liaison meeting.

DB to get in touch with AD to see if this should be added to the agenda of the PMB for next week’s meeting. AD will update soon.

SI-1 ATLAS Weekly Review and Plans [DCos]
not much to add

After the installation of AirCon at QMUL there was a drop in capacity, but this has now gone back to normal.

DAVS working fine at Glasgow.

Durham jobs limited to low-IO workloads. Need to see how this is working..

SI-2 CMS Weekly Review and Plans [DC/KE]

KE confirmed it has been a pretty good week for Tier 1 CMS. The job performance was really good and other issues are mostly gone.

Still looking for improvements from LHCONE for example. Number running a little low, but still above pledge.

ARC-CE runs are sometimes timing out before completion, so keeping an eye on this.

The group received 240TB of data from JINR which was a good test for Antares. A tape feature was not working properly. Rucio developer is going to fix this issues, but unfortunately it was not in place when running the 240TB this week.

Antares and CMS Rutio updates are coming soon which should help with staging issues.

KE has volunteered to use Xrootd monitoring framework Shoveler for some testing. Great interest to CMS. She will do the test install this week and give feedback.

KE has been asked to give a talk at LHC Paperfest later this month, to a lay audience. – Dave might have some information and images he can share. Will take a look.

SI-3 LHCb Weekly Review and Plans [AM]
Tier 1 – WebDAV enabled as primary protocol this morning.

Tier 2 everything ok.

Attention and focus of team is on the upcoming LHCb2022 conference. (Online from 16th to 20th May 2022.)

SI-4 Operations Meeting Report [SS,PG,PC]

– Minutes 03-05-22

SS just wanted to flag that it was noticed there are a number of VOs in use that are not on the GridPP approved VOs list. At some point we should formalise this and go through these and tick the appropriate box, although all VOs in this case have been “unofficially” approved previously.

SI-5 LCG Management Board Report of Issues [DB]

– Environmental Impact of WLCG

– Update on WLCG Privacy Notices

If any research groups approach looking for any resources we cannot make promises. Any decisions will need to be co-ordinated by the WLCG.

SI-6 External Contexts (eg NGI/EGI) [PC/JH]
No update

Actions:

782.4 DCos – Investigate VAC migration plan for Birmingham. [Ongoing – raised at UK Cloud Support meeting. No formal request as yet.] – follow up

787.3 DB – Request 2022 spreadsheet from Philip Jackson and provide to DC (IRIS) [DB+PC Jackson will redo MoU] [ongoing] have definitive numbers. DONE

793.1 PC – Follow up on NFL with relevant sites (PC to write letters) DONE

795.2 DB – Resource categorisation (GridPP, IRIS, Institute) [on-going] DONE

800.1 SS – Clear differentiation of metrics (GridPP / non GridPP split). (OSC MEETING) DONE

800.2 PC – LSST slides (OSC MEETING) DONE

800.3 SS – Status report on Oxford performance and storage (OSC MEETING) – SS send to DB

800.4 DB – Tier-2 hardware model (OSC MEETING) DONE

800.5 AD – Arrange in person DIRAC/Rucio meeting at IC (Jan22). [on-going] – AD plans to re-establish virtual meetings and then set up an in person meeting. AD and DC to try and arrange a date.

807.1 SS/AD – Impact discussion to be included in the Durham agenda. DONE

807.2 AD – Impact feedback required for OSC report. – DONE

808.1 PC/SS – investigate if sites can remove Cardiff stashcache from list of sites, and see if resolves issue DONE

808.2 – PC/JH – Determine if there is interest in getting more experience on StashCache (Done? Edinburgh developing a StashCache?) DONE

812.1 AD – Gantt chart for Tier-1 delivery (re OSC Action 4) DONE

812.2 JH, DCos – Draft contribution on efficiency of infrastructure and code, inc HL working Group (re OSC Action 7) DONE

812.3 AS, AD – Draft letter for PC to send to STFC management from PMB re network capacity

Network response? DONE

817.1 AD to investigate the percentage usage by ALICE of their TAPE Pledge DONE

818.1 – SS/PC Operations meeting on the 24th to focus on networking discussion

818.2 – SS and PC to talk to Matt D and sketch out agenda for larger network meeting over the summer

818.3 – Dave/Sam/Jill to get in touch with RJ soon to start discussion and plans around F2F meeting in Ambleside

818.4 – JS and SS to produce and add the PMB minutes to the CERN website every Monday morning when the Agenda is circulated to the PMB

818.5 – DB/SS Group to formalise VOs to be added to the approved list.

818.5 – PG/AD Pencil in the 25th of May as potential date for a T1 resource meeting. 1st June is plan B date.

AD to create an agenda.