In order to meet the needs of its users, GridPP offers a number of services via its infrastructure. Up-to-date technical information can be found on the GridPP Wiki, but you can find summaries of each by clicking on the tabs below.
-
Compute
At the beginning of 2012, the GridPP sites could provide around 30,000 logical CPUs for its users. This equates to 292,000 HEPSPEC06. HEPSPEC06 is a performance benchmark developed by the High Energy Physics community. Further details of how the different GridPP sites match up to it can be found on the wiki.
-
Middleware
The computing resources at sites are made available to the Worldwide LHC Computing Grid (WLCG) experiments via standard interfaces provided by Grid middleware installed on (typically) machines running Scientific Linux. GridPP sites have relied on the gLite software provided by the EGEE project and are now gradually transitioning to the newer middleware provided by the EGI and EMI projects.
-
Storage
At the beginning of 2012, the GridPP sites provided a total of 29 PB of disk-based storage for its users. There are four main storage systems in use in GridPP: the Disk Pool Manager, dCache, BestMan and CASTOR. Large data transfers of the experiments (i.e. tens of terabytes) are scheduled and managed through a central File Transfer Service (FTS) based for the UK at RAL.
You can find out more about GridPP storage on the wiki.
-
Networking
GridPP has a very active networking working group. Recent activity has focussed on IPv6 readiness. This has involved working closely with JANET, who have hailed GridPP as “[one of] the most advanced organisations on this matter”.
You can find out more about GridPP’s networking efforts on the wiki.
-
Monitoring
Several measures of performance are monitored for all sites. At one level, generic tests are applied automatically (via a Nagios-based service) and run every few hours to examine the reliability of services that are published as available. These tests do such things as check that jobs can be submitted, and that submitted jobs run, have access to storage resources and complete successfully. Coupled with the reliability is a measure of the availability, which takes account of whether or not a service expected at the site is in scheduled or unscheduled maintenance.
Tests and checks in this category are used by a distributed (drawn from several of the university sites) team of operations staff who work to a rota; these people are on-duty during working hours in order to follow-up on and escalate observed problems. There is a continual background of individual site electrical issues, network changes, air-conditioning failures and machine room maintenance works contributing to scheduled and unscheduled downtimes across GridPP sites. This situation has been easily manageable and has not caused any major problems or concern. Resilience is a key strength of the distributed Tier-2 structure.
You can find out more about GridPP’s monitoring efforts on the wiki.
-
Workload Management
Different experiments use different systems to manage their Grid jobs. The DIRAC solution was developed as a workflow management and data management system for the LHCb experiment but is now used by many smaller Virtual Organisations (VOs). DIRAC consists of many cooperating services and lightweight agents delivering the complex data processing and simulation workloads to the worldwide Grid fabric.
DIRAC was the first system in particle physics to use the pilot agent paradigm, whereby, prior to submitting batch jobs to Grid worker nodes, a so-called pilot job is first submitted. This pilot job checks that the local environment is correctly configured and has the necessary resources before pulling the real workload onto the node. This has been demonstrated to be essential to achieve high success rates and high efficiency across a heterogeneous Grid.
The Ganga software suite was developed to provide a simpler interface to enable physicists to submit their jobs across the Grid and to handle the complete life-cycle of each job.
- Security
There are a large number of Security activities within GridPP and within the wider Grid community. Most of this is centred around the concept of the Grid certificate, which is based on the X.509 standard.
You can find information about GridPP security policy, GridPP incident handling procedure on the GridPP Security wiki page.