Present: Dave Britton(Chair), Tony Cass, Jeremy Coles, David Colling, Pete Gronbech, Roger Jones, Steve Lloyd, Andrew McNab, Andrew Sansum, Gareth Smith, Louisa Campbell (Minutes).
Apologies: Pete Clarke, Tony Doyle, Dave Kelsey.
1. ResearchFish
PC has worked with Ian Fuller to resolve issues experienced with linking grants correctly on ResearchFish – GridPP grants and sub-grants are now linked and Ian is working on further links to the experiment grants. PC will shortly provide an update and PG is checking that all papers are linked correctly to grants and ready for the mid-March deadline – all PMB members will visit this and check well before the deadline.
2. Tier1 reaction to CERN CTA move and new about SL8500 roadmap
AS circulated emails relating to tape roadmap changes which have evolved several times. 12 months ago the model predicted T10k E drives and extra media this year. More recent information indicates that T10KE drives will not become available as the enterprise line of drives will not be developed further. This eventuality was modelled in 2014 when the GridPP5 proposal was being developed and it appears feasible to get to 2020 on D drives alone, though somewhat more expensive. For GridPP planning there are 2 scenarios – 1) Oracle announce LTO-8 drives will be compatible with their existing robot technology in which case we may move from T10KD to LTO-8, or 2) Oracle announce they are no longer developing robotics in which case we will continue to expand on T10KD. If Oracle robots become obsolete a solution will be required in 2020. TC mentioned Fermilab had some discussion from Oracle and the Tape VP of oracle is visiting next Monday, RAL may be okay but it’s not yet clear if LTO would meet our requirements. There was some discussion on the benefits of LTOs and Enterprise and cost effects of media. LTO is a competitive technology and the media fits in robots which can have LTO drives and media, this is lower life span but there is a suggestion LTO technology has increasingly similar properties which may have led to some duplication. LTO7 capacity lags behind the current enterprise media e.g. T10KEs. We have recently capitalised tape media and may require to purchase more which will be impacted by changes in exchange rates. AS will undertake more modelling on tape planning and take account of experiment requirements. LHCC reviewers for LCG tomorrow will discuss. The difference on E series to D series to 2020 was not concerning, but after 2020 it is not yet clear what the financial implications will be (i.e. after GridPP5).
CTA – no further work has yet been undertaken on this. There will be a meeting with CERN and the Castor team relatively soon. Staffing will not be available to cover this at the back end of GridPP5. Consideration needs to be given to requirement for using tape 2-3 years from now – this will be impacted depending on what direction industry goes and whether Oracle continues in this field. IBM have confirmed that they do not intend to make any changes.
ACTION 624.1: AS will rework tape modelling taking account of recent changes.
3. Data protection policy (WLCG-MB)
DK was unable to attend today’s PMB – this will be discussed next week.
4. GridPP38 agenda
PG has been drafting an agenda:
DB opening, then Tier1, then major experiments’ requirements in the next session. Session 2 Two strands required here – we should cover between now and end of GridPP5, i.e. what the experiments want and envisage their infrastructure will look like so that we can evolve to fit that. PG will send pointers out to relevant contributors – i.e. we need to understand trajectories. Titles will thereafter be tweaked as required.
Session 3 (Thu) – large sites and their plans going forward and perhaps recent purchases. There is an issue for the discussion session – how we monitor and reward sites, i.e. as the model evolves there must be constraints and issues. This will be discussed at the F2F, but requires some contributions from the sites to reach consensus and agree an effective model going forward. This session may become an opportunity for a resource discussion session in the current climate – starting with a presentation explaining current guidance and taking account of different site requirements.
Session 4 (Fri) – Non-LHC requirements, Mid-size site reports then technology and tools in Session 5, possibly containers and storage which can be used at smaller or medium sites. We should consider shrinking site talks and expanding on session 5 theme – or move the technology and tools discussion into session 4 and site reports into session 5.
It would be very helpful to have a survey of the sites, such as a table with all sites outlining what they are working on now and their future plans, i.e. dedicated resources or shared resources that cannot be changed. PG and AM will set up a SurveyMonkey to capture data for this.
ACTION 624.2: PG will firm up the agenda and alternative topics in Session 4 and 5.
ACTION 624.3: PG and AM will work up a SurveyMonkey for sites to outline what they are currently working on and future plans.
a) CDT announcement
UCL was awarded the CDT – this did not request a GridPP support letter. STFC signed a letter of support and are awaiting information – there may be some additional funds available to support (initially) a single year of a second CDT.
b) Scientific Computing Strategy Forum last Wednesday at CERN – c. 15 attendees plus 20 on video, approx. one per country and one from each of the 4 LHC experiments. DB represented the UK and AM represented LHCb. DB shared the open agenda with talks attached. There was a talk by Eckhard and another more detailed by Ian. There will also be open Minutes loaded on to the Indico page. There were various supporting statements and agreement with the high level view presented, but a feeling that a more detailed view needed to be prepared. Eckhard would like to restrict the forum to a maximum of c. 40 participants. From the UK perspective, PC would like to participate from his role on UKTO, the Science Board and working closely with Susan Morell and Tony Medland etc. DB would also like to participate but will separately discuss with PC. CERN will be giving an informal briefing to Funding Agencies (i.e Tony Medland) in March. Thereafter there will be a second meeting in May, date TBC.
c) DB attended a Scotgrid meeting at Durham last week and they are keen to host the next GridPP collaboration meeting at Durham.
6. Standing Items
SI-0 Bi-Weekly Report from Technical Group (DC)
A technical meeting took place on Friday and discussed purchases and a brief run through some storage aspects raised by DC at the last PMB regarding caching.
SI-1 Dissemination Report (SL)
Nothing to report.
SI-2 ATLAS Weekly Review and Plans (RJ)
Observation – a UK deputy computing coordinator will shortly take up post who should perhaps present at the Autumn GridPP meeting.
SI-3 CMS Weekly Review and Plans (DC)
Nothing to report.
SI-4 LHCb Weekly Review and Plans (PC)
Nothing to report.
SI-5 Production Manager’s report (JC)
1) There was a planned disconnection of CERN AFS last week. As expected it brought several issues to the surface that sites will need to address.
2) Team changes have led to GridPP keydocs (https://www.gridpp.ac.uk/php/KeyDocs.php) getting out of date. A review was started last week with reminders being sent out to owners. The area already looks better!
3) A follow-up security communications challenge took place and has shown that the lessons learned in the original challenge have been acted upon.
4) VAC is now installed and running at Birmingham for LHCb.
5) An issue with GRIDPP DIRAC led to the loss of files that affected the skatelescope VO.
6) SNO+ is using an increasing amount of the T2 `’other VO” disk space and a discussion has started about limiting usage through spacetokens.
5) The next HEPSYSMAN meeting will take pace 13th to 15th June at RAL.
SI-6 Tier-1 Manager’s Report (GS)
General: We have continued applying patches for CVE-2016-7117. Some work ongoing upgrading remaining systems from SL5 which has end
of life at the end of March.
– Following the upgrade to Castor 2.1.15 most of the issues have been resolved. Summary:
– We were failing CMS xroot redirection tests. This was fixed by changing the priority of the xroot access to CMSDisk in Castor.
– There is a problem with a database resource (number of cursors) becoming exhausted. This has affected more than one of the
instances. Investigations into this are ongoing. There is a bugfix to Castor in version 2.1.16 in this area.
– We are managing memory leaks seen in the transfer manager component.
– We still see some timeout test failures in SAM tests for CMS.
– We are planning the upgrade of the SRMs.
– CMS found a bug in their SRM test which they have fixed. This was behind the request for a re-calculation of our availability for
January. This has been done and the resulting availability figure was 100%. (Before the recalculation it was 90%).
– Last week a major version upgrade (to “kraken”) was carried out transparently and went well.
– IPV6 was enabled on one of the core network switches on Wednesday 8th. The next step in the planned enabling of IPv6 access is
allowing IPv6 on the Tier1 OPN Router this Wednesday (22nd Feb).
SI-7 LCG Management Board Report of Issues (DB)
DB did not attend due to travel issues.
SI-8 External Contexts (PC)
Nothing to report
616.3: DB and SL will discuss how best to progress replacement of TW’s role. (Update: DB has reviewed and now await admin) Ongoing.
620.1 DB to contact DK re the procedure to deal with a security incident and the media. (Update: DK had devised an interim statement which involved TW as dissemination officer and he is no longer in post – there is no prescriptive full response as this would be dependent on circumstances and probably involve an emergency PMB and communication with relevant PR representatives). DK will send the statement to PMB in case required in future – spokesman SL as head of board or DB as project leader. Ongoing.
622.1: DB and PG will work on an agenda for GridPP38 and run this past DC for comment/input. Done.
623.1: PG will test ResearchFish and upload the latest papers for other members to inherit into their CG. Done.
623.2: PC will email Ian Fuller to mention ongoing issues on ResearchFish from last year. Done.
623.3: DB and AS will discuss how best to summarise the Tier1 review. (Update: a brief summary will be written up and presented). Ongoing
623.4: GS will upload talks from the Tier1 review to the Agenda. Ongoing.
623.5: PG will put the Deadlines for OSC reports onto the F2F agenda. Done.
623.6: RJ will conduct an investigation on Atlas efficiency being 5-6% lower than usual. (Update: historically there are occasional dips but no cause is clear). Done.
ACTIONS AS OF 20.02.17
616.3: DB and SL will discuss how best to progress replacement of TW’s role. (Update: DB has reviewed and now await admin) Ongoing.
620.1 DB to contact DK re the procedure to deal with a security incident and the media. (Update: DK had devised an interim statement which involved TW as dissemination officer and he is no longer in post – there is no prescriptive full response as this would be dependent on circumstances and probably involve an emergency PMB and communication with relevant PR representatives). DK will send the statement to PMB in case required in future – spokesman SL as head of board or DB as project leader. Ongoing.
623.3: DB and AS will discuss how best to summarise the Tier1 review. (Update: a brief summary will be written up and presented). Ongoing
623.4: GS will upload talks from the Tier1 review to the Agenda. Ongoing.
624.1: AS will rework tape modelling taking account of recent changes.
624.2: PG will firm up the agenda and alternative topics in Session 4 and 5.
624.3: PG and AM will work up a MonkeySurvey for sites to outline what they are currently working on and future plans.
DB will be unavailable next week – PG will chair the PMB if the meeting goes ahead.