You are here

HyperLoom: A Platform for Executing Scientific Pipelines in Distributed Environments

Date: 
Mon, 06/04/2018 - 9:30am
Registration deadline: 
Fri, 06/01/2018 - 11:45pm
Venue: 
VŠB - Technical University Ostrava, IT4Innovations building, room 207
Tutor: 
Vojtěch Cima, Stanislav Böhm (IT4Innovations)
Level: 
beginners-intermediate
Language: 
English

Annotation

Real-world applications often encompass end-to-end data processing pipelines composed of a large number (millions) of interconnected computational tasks of various granularity. We introduce HyperLoom as a platform for defining and executing such pipelines in distributed environments using a Python API.

Scientific pipelines such those in machine learning compose of multiple data processing tasks. HyperLoom users can easily define dependencies between computational tasks and create a pipeline which can then be executed on HPC systems. The high-performance core of HyperLoom dynamically orchestrates the tasks over available resources respecting task requirements. The entire system was designed to have minimal overhead and to efficiently deal with varying computational times of the tasks. HyperLoom allows to execute pipelines that contain basic built-in tasks, user-defined Python tasks, tasks wrapping third-party applications or combinations of those.

This course will introduce HyperLoom and possibility of its usage in HPC environments based on our experience with HyperLoom deployment at IT4Innovations national supercomputing center.

Purpose of the course (benefits for the attendees)

Attendees will learn the key concepts of HyperLoom, its architecture, and usage explained through practical examples.

About the tutors

Stanislav Böhm is a computer science researcher at Advanced Data Analysis and Simulations Lab at IT4Innovations and Institute of Formal and Applied Linguistics at Charles University. He is interested in distributed systems and verification problems. He received his Ph.D. in 2014. He is the main author and team leader of the following software projects related to HPC:  HyperLoom (framework for distributed pipelines), Aislinn (verification tool for MPI programs), Kaira (high-level development environment for MPI programs), and Haydi (combinatorial framework).

Vojtěch Cima is affiliated as a research assistant and Ph.D. student at Advanced Data Analysis and Simulations Lab at IT4Innovations where he actively participates in national and European research projects focusing on workload distribution and machine learning.

Preliminary agenda

09:30 - 10:00registration
10:00 - 11:30Introduction, overview, getting started
11:30 - 12:30lunch break
12:30 - 14:00Hands-on session (distributed parameter search, model cross-validation, ...)
14:00 - 14:30

coffee break

14:30 - 16:00

Bring your own problem, discussion

Prerequisites

  • laptop with an SSH client
  • basic knowledge of Python
  • basic knowledge of Linux

Registration

Obligatory registration - registration form here; for the deadline (extended) see above, or exhausted course capacity.

Capacity and Fees

30 participants. The event is provided free of charge.

Practicalities

  • See the links below for how to get to the campus of  VŠB - Technical University Ostrava and to the IT4Innovations building.
  • Documentation for IT4Innovations' computer systems is available at https://docs.it4i.cz/.
Attachments: