GridPP PMB Meeting 654

GridPP PMB Meeting 654 (11.12.17)
Present: Dave Britton (Chair), Tony Cass, Pete Clarke, Pete Gronbech, Roger Jones, Steve Lloyd, Andrew McNab, Andrew Sansum, Gareth Smith, Louisa Campbell (Minutes).

Apologies: Jeremy Coles, David Colling, Tony Doyle, Dave Kelsey.

1. CMS request for new resources
DC was not present. AS confirmed CMS are not yet using their allocation of CPU or disk, but there is no headroom to allocate additional tape to CMS. AS recently calculated annual trend rates and there seemed sufficient tape until the end of GridPP5 but recently growth rates have increased. CMS growth rate is sporadic, making it challenging to identify trends, ATLAS and LHCb will fall just short and this should balance out until the end of the year to meet the MoU but it is not possible to make more tape available to CMS.

2. QCD calculations on Grid
PG was contacted by Mark Sutton (Sussex) requesting use of GridPP resources to carry out QCD calculations, possibly under the phenomenology VO or GridPP or ATLAS. PC updated he had recently met Mark who mentioned difficulties filling out the form for a Dirac HPC resource request and Sussex had not advised him that GridPP was available for such work. PC suggested he submit a request. Mark is an RA who works with Trigger on ATLAS and Apelfast. It was agreed he should be using GridPP for this work and should be enabled to undertake it. PG has forwarded the email to the PMB and will telephone Mark to discuss the way forward and arrange access to GridPP.
ACTION 654.1: PG will telephone Mark Sutton to arrange for him to have access to GridPP.

3. CPU Efficiencies (ATLAS efficiency lower than normal?)
GS circulated efficiencies for November and the ATLAS efficiency has dropped, RJ is investigating. PG noted the format of the email differs from normal, but GS confirmed that the same code as usual was used to generate the numbers.
ACTION 654.2: RJ will investigate ATLAS CPU efficiencies and report to PMB.

4. Disk deployment (ECHO usage vs allocation?)

DB noted Echo deployment re disk use and resource allocation – the data does not define what is being used. He enquired if LHCb is using 1.5PB allocated and GS confirmed it is not and that only ATLAS was using their allocation – CMS has c. 2PB available. These figures were confirmed in an email from AS this morning. PG circulated a pdf of usage that Andrew Lahiff previously normally prepared.

5. UK HEP GPU Workshop proposal
It has been agreed that we should support this at QMU, DK has confirmed we can cover travel costs. AM noted a pre-GDB on GPU is planned for February – the workshop should, therefore, be arranged for around 13 February. AM will follow this up with Chris

a) EUCLID are expected to undertake big simulation campaigns shortly (June 2018) and advised they require 30,000 cores for 2 weeks. PC noted this is the type of activity that we should be doing for shared resources – ie between GridPP, extra monies and RAL as well as staff availability – we should be able to allocate around 10,000-15,000 over 4 weeks. PC has advised EUCLID they should write up a formal request to allow us to investigate and begin planning. DB confirmed 5,000 would be manageable across the sites. UKT0 is only procuring 3,000 by end March which may be challenging to deploy by June. This should be discussed at the UKT0 meeting. The reliability of their timeframe requires to be confirmed the it was agreed that GridPP is willing to try to contribute but will need to know well in advance and coordinate several thousand cores across the UK with UKT0 dealing with the remainder.

b) DB suggested email exchanges should be undertaken on when to spend the remainder of the Tier2 funds as Tony Medland has asked for confirmation on how much Capital is required for next year.

7. Standing Items

SI-0 Bi-Weekly Report from Technical Group (DC)
There was a technical meeting this week but DC and AM were not present and no report was submitted.

SI-1 ATLAS Weekly Review and Plans (RJ)
Nothing significant to report.

SI-2 CMS Weekly Review and Plans (DC)
DC was not present and no report was submitted.

SI-3 LHCb Weekly Review and Plans (PC)
Dirac development meeting – Chris outlined from LHCb how we see the task being broken down into 6 Girotasks and this appears to be in hand. Chris should update this at the Joint meeting.

SI-4 Production Manager’s report (JC)
JC was not present and no report submitted.

SI-5 Tier-1 Manager’s Report (GS)
The quarterly report has been prepared and will be forwarded to PG immediately after today’s meeting.

• On Wednesday (6th Dec) all the SRMs systems (except LHCb – which had already been done) were successfully upgraded to the latest version (2.1.16-18)
• Three disk servers (old ones from 2012) have been added to the LHCb Disk-only space in Castor to alleviate problems of this area being too full.

• The maximum number of gridftp connections to each Echo gateways has been increased to 200 (from 100).
• Echo is running normally. Background scrubbing is going on. This is flushing out bad disks – and the rate at which it finds these is expected to drop over the next week or two. The plan is to run like this through the holiday period.

• EGI will withdraw support for the WMS from the start of 2018. Our WMS service will be stopped on this timescale.

• There was a problem of high packet loss for traffic to/from the Tier that passed through the RAL core network (and firewall) on Monday (4th). The problem started at midnight and was fixed around 15:30.

• Following the failure of the generator to start during the power outage of a couple of weeks ago a faulty emergency power-off switch was found and has been replaced. Planes are being made for a generator load test – hopefully on Wednesday (13th Dec).

• Following problems with the updated UK CA certificate in the IGTF 1.88 rollout we had updated and then rolled back. This had left is with some issues in our configuration/deployment system (Quattor/Aquilon) – but those were resolved quickly. We made a plan to roll forward again tomorrow (12th Dec) – and that is still the plan.

Christmas Plans:
• We will follow the same pattern as in previous years. The on-call team will be in place as usual. Some additional checks will be made by those on-call. RaL is closed after Friday afternoon 22nd December and will re-open on Tuesday 2nd January.

Staffing: There are recruitments running for Security Officer and Tier1 Manager – Security Officer is close to release and Tier1 advert closed last week with interviews planned in the New Year. Database manager has been selected and is close to starting. A contract for HW is now in place (awaiting approval).

ACTION 654.3: AS will prepare a brief report summarising and updating the current staffing situation.

SI-6 LCG Management Board Report of Issues (DB)
There has been no MB.

SI-7 External Contexts (PC)
Nothing to report.

644.2: PG and AS will document plans and costings for the remainder of GridPP5 taking account of the Oracle tape issues experienced. (Update: a draft will be produced before Christmas). Ongoing.
644.3: AS put together a starting plan for staff ramp-down. (Update: a draft will be produced before Christmas). Ongoing.
644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY?) Ongoing.
647.1: PC will update Data Management Plan. Done.
647.2: DB will circulate link for Data Management Plan once agreed. Ongoing.
649.1: DB will write Introduction of OS documents. Ongoing.
649.2: PC will write Wider Context of OS documents. Ongoing.
649.3: PG will schedule a discussion of the Risk Register at a PMB meeting in December then update this in the OS documents. Ongoing.
649.4: GS and AS will write the Tier1 Status section of OS documents. Ongoing.
649.5: JC will write Deployment Status section of OS documents with input from PG. Ongoing.
649.6: RJ, DC and AS will write LHC section of User Reports in OS documents. Ongoing.
649.7: JC will write Other Experiments section of User Reports in OS documents with input from DC and PG. Ongoing.
653.1: DB will write to VO managers requesting they edit VO cards to acknowledge the use of “GridPP resources across the UK”. Done.
653.2: AS and AM to schedule a meeting between now and Christmas to push forward a meeting with relevant parties to discuss Echo. Done.

ACTIONS AS OF 11.12.17

644.2: PG and AS will document plans and costings for the remainder of GridPP5 taking account of the Oracle tape issues experienced. (Update: a draft will be produced before Christmas). Ongoing.
644.3: AS put together a starting plan for staff ramp-down. (Update: a draft will be produced before Christmas). Ongoing.
644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY?) Ongoing.
647.2: DB will circulate link for Data Management Plan once agreed. Ongoing.
649.1: DB will write Introduction of OS documents. Ongoing.
649.2: PC will write Wider Context of OS documents. Ongoing.
649.3: PG will schedule a discussion of the Risk Register at a PMB meeting in December then update this in the OS documents. Ongoing.
649.4: GS and AS will write the Tier1 Status section of OS documents. Ongoing.
649.5: JC will write Deployment Status section of OS documents with input from PG. Ongoing.
649.6: RJ, DC and AS will write LHC section of User Reports in OS documents. Ongoing.
649.7: JC will write Other Experiments section of User Reports in OS documents with input from DC and PG. Ongoing.
654.1: PG will telephone Mark Sutton to arrange for him to have access to GridPP.
654.2: RJ will investigate ATLAS CPU efficiencies and report to PMB.
654.3: AS will prepare a report summarising and updating the current staffing situation.

654.4: SL to follow up with EGI to confirm our suggestion for GridPP recognition by Biomed has been actioned.

Schedule of PMB Meetings

18th December – next PMB meeting
25th December – No PMB meeting
1st January – No PMB meeting
8th January – No PMB meeting – clash with Cloud workshop
15th January –PMB meeting.