We propose several methods for speeding up the processing of particle physics data on clusters of PCs. We present a new way of indexing and retrieving data in a high dimensional space by making use of two levels of catalogues enabling an efficient data preselection. We propose several scheduling policies for parallelizing data intensive particle physics applications on clusters of PCs. We show that making use of intra-job parallelization, caching data on the cluster node disks and reordering incoming jobs improves drastically the performances of a simple batch oriented scheduling policy. In addition, we propose the concept of delayed scheduling and adaptive delayed scheduling, where the deliberate inclusion of a delay improves the disk cache access rate and enables a better utilisation of the cluster. We build theoretical models for the different scheduling policies and propose a detailed comparison between the theoretical models and the results of the cluster simulations. We study the improvements obtained by pipelining data I/O operations and data processing operations, both in respect to tertiary storage I/O and to disk I/O. Pipelining improves the performances by approximately 30%. Using the parallelization framework developed EPFL, we describe a possible implementation of the proposed access policies, within the context of the LHCb experiment at CERN. A first prototype is implemented and the proposed scheduling policies can be easily plugged into it.
David Atienza Alonso, Marina Zapater Sancho, Luis Maria Costero Valero, Darong Huang, Qunyou Liu
Lesya Shchutska, Olivier Schneider, Yiming Li, Aurelio Bay, Guido Haefeli, Christoph Frei, Frédéric Blanc, Tatsuya Nakada, Michel De Cian, François Fleuret, Elena Graverini, Renato Quagliani, Federico Betti, Aravindhan Venkateswaran, Vitalii Lisovskyi, Veronica Sølund Kirsebom, Elisabeth Maria Niel, Ettore Zaffaroni, Jian Wang, Mingkui Wang, Zhirui Xu, Yi Zhang, Ho Ling Li, Mark Tobin, Niko Neufeld, Matthew Needham, Maurizio Martinelli, Vladislav Balagura, Donal Patrick Hill, Liang Sun, Xiaoxue Han, Liupan An, Federico Leo Redi, Maxime Schubiger, Hang Yin, Violaine Bellée, Preema Rennee Pais, Tara Nanut, Yao Zhou, Tommaso Colombo, Vladimir Macko, Guillaume Max Pietrzyk, Evgenii Shmanin, Maxim Karpov, Sebastian Schulte, Simone Meloni, Xiaoqing Zhou, Lino Ferreira Lopes, Surapat Ek-In, Carina Trippl, Sara Celani, Marco Guarise, Serhii Cholak, Viros Sriskaran, Yifeng Jiang, Dipanwita Dutta, Zheng Wang, Yong Yang, Yi Wang, Hao Liu, Gerhard Raven, Peter Clarke, Frédéric Teubert, Xiao Wang, Victor Coco, Shuai Liu, Adam Davis, Paolo Durante, Yu Zheng, Anton Petrov, Alexey Boldyrev, Almagul Kondybayeva, Hossein Afsharnia