CERN and Petabytes of Data

April 6, 2018 (6y ago)

Table of Contents

  1. Introduction
  2. Museums
  3. The Tour
  4. IT infrastructure
  5. CERN Open Data
  6. Conclusion

Introduction

I have had a childlike fascination with CERN (Conseil Européen pour la Recherche Nucléaire) ever since I heard about the World Wide Web (WWW). WWW started as a project there. Berners-Lee and Cailliau worked on it. CERN is located in the France-Switzerland border in the canton of Geneva. It has about 2500 employees (scientists, technical and non-technical staff) and tens of thousands of users worldwide. How does a small organization like this generate almost 50 petabytes of data a year? What exactly do they do? To answer these questions, CERN offers a public tour of its premises, where you will be accompanied by a physicist. Appointments for the tour can be scheduled online. You can also show up early without an appointment and get on a waiting list which will get you in one of the group tours scheduled for the day.

This is how big the Large Hadron Collider (LHC) is. Image1 Image 1. The tunnel has a circumference of 26.7 Kilometers. The water source nearby is Lake Geneva.

Museums

Before the tour started, I wandered into a museum called Microcosm, which was an interactive exhibition presenting the work of the CERN particle physics laboratory and the LHC. Some images from this museum are shown below.

Image2 Image 2. Entering a mock LHC tunnel with humans in screen explaining things interactively

Image3 Image 3. An old IBM 3090 mainframe in the museum

Image4 Image 4. A simulation of a particle accelerating to high speeds

Then I went to the Globe of Science and Innovation, where you have futuristic, interactive, dimly-lit rooms to explain Physics to youngsters.

Image5 Image 5. Outside

Image6 Image 6. Inside

The Tour

I was in a group of about fifteen other enthusiasts and the group was assigned a physicist who had several years of experience working at CERN. The walking tour started in a room with a projector, where the physicist presented a few slides on what was about to come. A short four-minute video about the whole campus was followed by a quick Q&A about the logistics of the two-hour walking tour. The physicist's only request was to try to not get lost in the campus during the tour. He also said every single nook and corner of the premises was open to photography and that CERN doesn't keep any secrets from the public.

The physicist had secure access to a building and took us there. This was the ATLAS control room, where I saw scientists and analysts working at gathering real-time data. We went to a room upstairs where we were given 3D glasses and headphones to view a video explaining how particle collisions happen, how much energy is emitted and what particles are produced from these collisions. It is to be noted that it is physically impossible for anyone to go into the tunnel for LHC (barring a few repair technicians) as it is maintained at an extremely low temperature (operating temperature is around minus 271 degrees Celsius, which is around the temperature of liquid helium (a superfluid)).

Image7 Image 7. Watching a 3D video of the particle acceleration

IT infrastructure

There was a brief presentation by the physicist on how much data CERN generates and stores. This was indeed mind-blowing. CERN (as I mentioned earlier) generates about 50 petabytes of data every year. I was instantly puzzled by the IT infrastructure in place. I have since found (from a CERN presentation) that they use around 100000 cores, a couple of hundred petabytes of HDDs, 100 petabytes of tape and a couple of hundred terabytes of memory consuming around 6 megawatts of power. It also looks like they are running an open source distribution of OpenStack and a Ceph backend for the storage.

Image8 The physicist explains the ATLAS site

Image9 Image 9. More information about the IT infrastructure (15 PB of data from LHC experiments alone)

There was a brief Q&A period and I asked a question about a list of software being used at ATLAS. There were several other questions ranging from "Why Switzerland?" to "Is this really safe?".

CERN Open Data

The Petabytes of data (I mentioned above) are accessible to anyone in the world. The datasets, open source software used, experiment results and documentation can be accessed from here: http://opendata.cern.ch

Conclusion

Several scientists across the globe collaborate and use CERN's infrastructure and make exciting discoveries. The physicist reassured our group that there was no possibility of creating any unknown consequences as the overall energy generated by the collision of neutrons or even larger atoms is fairly low, and it is indeed the energy density which is very high. The by-products of the collision are of particular interest to several scientists.

Another fact that the physicist shared was that it is legal in Switzerland to dig under houses after a certain depth (to construct tunnels such as the LHC) whereas in places like USA it would not be legal, as anyone who owns a house in USA owns everything that is below it.

For the next part of the tour, we crossed the street and headed to the "older" CERN where facilities used in the 1990s to detect new particles were explained in a planetarium style presentation. The room was dark, and the presentation threw light on certain areas of the equipment while explaining the concepts. There was also a brief overview of the history behind the creation of CERN.

Image10 Image 10. A section of an old synchrotron used for particle acceleration till the early 1990s

While crossing the street, I asked a question about whether CERN tries to generate any revenue from its infrastructure. The physicist confirmed that CERN is still a non-profit research organization and that he didn't know much about that. In my limited searches, I haven't been able to get official funding/revenue information yet and if you know, please let me know.