Jste zde


St, 14.10.2015 9:30 - Čt, 15.10.2015 15:30
Uzávěrka registrace: 
Ne, 11.10.2015 23:30
VŠB - Technical University Ostrava, IT4Innovations building, room 207
Shadi Ibrahim (Inria Rennes Research Center, France)


Data volumes are ever growing, for a large application spectrum going from traditional database applications, scientific simulations to emerging applications including Web 2.0 and online social networks. To cope with this added weight of Big Data, we have recently witnessed a paradigm shift in the way data is processed through the MapReduce model. First promoted by Google, MapReduce has become, due to the popularity of its open-source implementation Hadoop, the de facto programming paradigm for Big Data processing in large-scale data-centers and clouds.

The goal of this tutorial is to serve as a first step towards exploring the Hadoop platform and also to provide a short introduction into working with big data in Hadoop. An overview on Big Data including definitions, the source of Big Data, and the main challenges introduced by Big Data, will be presented. We will then present the MapReduce programming model as an important programming model for Big Data processing in the Cloud. Hadoop ecosystem and some of major Hadoop features will then be discussed. Finally, we will discuss several approaches and methods used to optimise the performance of Hadoop in the Cloud.
Several hand-ons will be provided to study the operation of Hadoop platform along with the implementation of MapReduce applications.
This course is a substitute for the Hadoop session, which could not be held during the PRACE Winter School 2015.

About the tutor

Dr. Shadi Ibrahim is a permanent Inria research scientist within the KerData research team. He obtained his Ph.D. in Computer Science from Huazhong University of Science and Technology in Wuhan of China in 2011. His research interests are in cloud computing, big data management, data-intensive computing, high performance computing, virtualization technology, and file and storage systems. He has published several research papers in recognized big data and cloud computing research journals and conferences, among which, several papers on optimizing and improving Hadoop MapReduce performance in the cloud and one book chapter on MapReduce framework.

Preliminary schedule

Wednesday  October 14,  2015
10:00-11:30An introduction to Big Data
11:30-13:00lunch break
13:00-14:30Big Data processing in the Cloud: The MapReduce programming model
14:30-15:00coffee break
15:00-16:30Hadoop ecosystem: An overview
16:30-17:00coffee break
17:00-18:00Practical session on deploying Hadoop


Thursday  October 15,  2015
09:00-10:30Hadoop: Optimizations and open issues
10:30-11:00coffee break
11:00-12:45Practical session on using and configuring Hadoop
12:45-14:00lunch break

Practical session on developing MapReduce applications


This tutorial assumes some experience with using the Linux command-line. Programming skills in Java are a plus for this tutorial. To participate in the exercises a laptop is needed.

NEW: For the practical session environment, we are going to use an Ubuntu virtual machine, to be installed on participants' laptops, preferably before the event. Please download it from

To run this virtual machine, install  a free VMware workstation player to run this VM. The link:


Obligatory registration - registration form here; deadline see above or exhausted course capacity.


The event is provided free of charge for the participants.


30 attendees


  • NEW: For training environment preparation, please see the Prerequisites section above.
  • See a page on transport and accommodation (in Czech) how to get to the campus of  VŠB - Technical University Ostrava and to the new IT4Innovations building.
  • Participants without the IT4Innovations card please arrive early enough to settle the formalities with obtaining an entry permit.
  • System documentation is available at http://support.it4i.cz/docs.