-
Notifications
You must be signed in to change notification settings - Fork 17
tutorial codes overview
A CODES simulation consists of three main parts: a workload, a network model, and the CODES core which acts as a sort of glue, interfacing the two other parts.
The workload consists of many processes/ranks. These ranks can be considered tasks on compute nodes. Each rank in the workload creates traffic "events" in the simulation, specifying a time when that event is to happen and to what other rank that event is to be sent to. Each rank in the workload is represented by an LP (usually named nw-lp).
There are three main types of workloads supported by vanilla CODES:
- Synthetic
- Traces
- Online
Synthetic workloads are very simple. Each rank will generate a number of messages to send to other ranks. Once all ranks have sent and received all of their messages, the simulation ends. One can write a synthetic workload "program" that can create essentially any arbitrary pattern of communicaiton. There is one catch to writing your own synthetic workload: it requires a little bit of familiarity with discrete event simulation and if you intend to make use of optimistic execution offered by ROSS/CODES, then you will need to write not only the forward event handlers but also the reverse event handlers.
Core CODES supports DUMPI traces to supply traffic to the network model. These are essentially files that contain the communication patterns utilized by real-world HPC applications. We aren't simulating the actual execution of these applications but we are seeing how their communication plays out on the network model.
A new feature of CODES is the utilization of online workloads. These are a special type of synthetic workloads that may actually do some work that makes the resulting communication patterns more realistic than what is possible from a plain synthetic workload. Currently supported by core CODES is Intel's Scalable Workload Models (SWMs). CoNCePTuaL DSL based online workloads are currently in the process of being brought into mainline core operation.
While we have LPs that represent tasks that create traffic, we still need something for this traffic to be put onto. The network model in CODES is what we refer to the Terminals (sometimes referred to as Compute Nodes) and Routers (sometimes referred to as Switches) that make up the interconnect system. While the terminals technically aren't a part of the interconnect, it has proven beneficial to keep it distinct from the nw-lp workload LPs as it gives us greater flexibility. It lets us, if we wanted to, have multiple tasks per terminal or have terminal behavior or data collection that is unique to a particular interconnect model.
Each CODES simulation execution command is usually accompanied by a simulation configuration file. This file contains all the information that CODES needs to instantiate the simulation. This includes a general CODES PDES simulation configuration specifying the number of each LP type in addition to model-specific configurations.
Let's apply what we've learned so far and map out, on paper, a simple CODES simulation.
Let's say we're going to develop and simulate an All-to-All switch network with 4 switches and 1 terminal attached to each switch. Each terminal has a single simulated workload task on it and it's just going to run a simple synthetic workload where each workload LP send 10 messages to random other workload LPs before the simulation is concluded. So that's three LP types that we must implement making a total of 12 total LPs (4 switch LPs, 4 terminal LPs, 4 workload LPs).
We would have 4 nw-lp workload LPs that create traffic to send to other workload LPs over the above defined network. We'd have to have forward and reverse handlers for this LP type. The forward would create an event and send it to a random destination workload LP and then increment its message counter. If the message counter is >= 10 then it stops. We'd need a reverse handler that decrements that message counter so that the workload LP would send a net of 10 messages. This would be implemented in a synthetic workload generator and would be similar to other synthetic generators found in src/network-workloads/.
The other two LP types, switches and terminals, would be implemented as their own network model like those found in src/networks/model-net/. The behavior for each switch and terminal LP will be encoded into this model.
We would have 4 terminal LPs that generate packets on the network at the order of its respective workload LP. We'd need forward event handlers (and reverse handlers!) for what happens when a generation event is ordered as well as what happens when a packet is receieved from a router on the network. We'll also need 4 switch LPs that receive traffic from their attached terminals as well as other switches on the network and forward on toward each packet's destination.
Once the synthetic workload generator and the model defining the network's behavior is written, we need a configuration file that is read by CODES and creates a network to its specifications. We now have the three main parts required to run a standard CODES simulation: the workload, the model, and the configuration.
We will revisit this toy model and actually develop it in another tutorial.