One of the essential steps during embedded software development for multi-core platforms is the mapping from elements of the software to elements of the hardware, e.g.
Tasks to
Schedulers,
Labels to
Memorys etc. This is usually a non trivial task, as an infinite number of combinations arises if either the software or the hardware becomes very complex. The purpose of APP4MCs OpenMapping Plugin is to determine such a mapping and store it in a
Mapping Model
which will contain the allocations of elements of the
Software Model
to elements of the
Hardware Model
.
The conceptual implementation of OpenMapping Plugin is shown in the following Figure.
As shown in the top of this figure, it requires several models to operate. The models for Software, Hardware and Constraints are mandatory while the Property Constraints Model is optional.
Using the OpenMapping Plugin, the user is able to choose between different mapping strategies. Currently these strategies are split into two categories: Heuristic methods and Integer Linear Programming (ILP) based methods. Unlike ILP based methods, Heuristic methods, such as the Heuristic Data Flow Graph (DFG) load balancing, will immediately create a mapping.
ILP based methods on the other hand will first need to generate an ILP model of the mapping problem according to the selected mapping strategy, e.g. ILP based load balancing or Energy aware mapping. Once the ILP model has been created, it will be solved by one of the mathematical Solvers. Currently, the open source project Oj!Algo 1 has been used in OpenMapping Plugin. Furthermore, the user can activate an optional MPS generator, which will generate an MPS file containing the ILP problem. This file may be used to solve the ILP problem by external (e.g. commercial) solvers, which tend to be more efficient in solving larger models compared to open source Java implementations.
Once a mapping has been determined, it is displayed within the eclipse console and following output models are generated:
Tasks to
Schedulers
Schedulers for each
Core
Stimuli for the resp.
Runnable and
Task activations
The following subsections give a short introduction about the different algorithm implementations of the OpenMapping Plugin. Section Task generation describes the task generation method which is used to convert process prototypes into tasks. It is meant to be used by mapping algorithms which do not feature task generation by themselves. Sections Mapping Strategy 1: Heuristic DFG load balancing and Mapping Strategy 2: ILP based load balancing describe a heuristic and a mathematical load balancing approach for mapping tasks to cores. Finally, a more complex method for energy efficient task mapping with its own task creation algorithm is outlined in section Mapping Strategy 3: Minimizing Energy Consumption.
The task generation method in OpenMapping Plugin is a pragmatic way to create tasks for other mapping algorithms which require
Tasks, i.e. are not designed to agglomerate
Runnables into
Tasks on their own. This step utilizes
ProcessPrototypes which are generated by the partitioning plugin (see Chapter
Partitioning) and transforms them into
Tasks. Furthermore, it will also create the
Stimuli Model
which contains the activation elements for the
Tasks, i.e.
PeriodicStimulus. An overview about the transformed elements and their sources as well as destinations is shown in
Table 1.
| Source Model | Source Element | Target Model | Target Element |
|---|---|---|---|
| SW |
ProcessPrototype
|
SW |
Task
|
| SW |
PeriodicActivation
|
Stimuli |
PeriodicStimulus
|
| SW |
TaskRunnableCall (within
ProcessPrototype)
|
SW |
TaskRunnableCall (within
Task)
|
Table 1
The Heuristic Data Flow Graph (DFG) load balancing algorithm aims at achieving an equal utilization of a hardware platforms cores for DFG based software models.
The first step in this algorithm is to determine the most complex
Task (usually representing the critical path) and allocate it to the best fit core of a hardware platform. The runtime for each
Task will now be estimated for every
Core within the
System and allocated to a
Core which has the smallest increase of the longest overall runtime within all cores.
One of the major benefits of this algorithm is its very low runtime. The information which is processed by this mapping strategy and, as such, has to be present in the input models, is shown in Table 2.
This section described a comparatively simple ILP based strategy for allocating tasks to processors while minimizing the total execution time. This method supports multiple processors with the same processing speed (e.g. homogeneous processors) and it does not consider any dependencies between the tasks (e.g. waiting for the results of the predecessor).
Load balancing within this method is achieved by minimizing the highest execution duration C max of all m processing units with n tasks. The variable x ij is set to 1 if a task j is allocated to processor i and 0 otherwise. The model guarantees that each task is allocated to exactly one processor and limits the variables x ij type to boolean values. The duration (execution time) of a task j is specified by p j .
One of the downsides in this algorithm is caused by variable p j which forces an equal processing duration of a task j on all cores. It is however possible to expand the method to support heterogeneous processors (in this case: processors with different processing speeds) with a minor modification: replacing p j with p ij , i.e. a separate processing duration of task j for every core i, will solve this problem.
The minimal amount of information which is required to execute this algorithm is outlined in Table 2.
| Source Model | Element | Description |
|---|---|---|
| HW |
Core
|
A
Core represents the target of an allocation. One OS Model with a
Scheduler for each
Core will be generated.
|
CoreType
Prescaler
Quartz
|
A cores
Prescaler, the referenced
Quartz and the
CoreTypes attribute
CyclesPerTick of a
Core are used to determine the number of processed
Instructions per second.
|
|
| SW |
Task
|
Tasks will be allocated to a
Core (over the cores
Scheduler)
|
Runnable
|
Runnables are derived from a tasks
TaskRunnableCalls, their attribute
Instructions is used during the load calculation for each
Core
|
|
| Stimuli |
Stimulus (
PeriodicStimulus)
|
The
PeriodicStimulus is used to specify a tasks activation rate, i.e. the period between its calls
|
Table 2
This mapping algorithm is based on the work "Task Scheduling and Voltage Selection for Energy Minimization" from Zhang et al. which presents a framework that aims at minimizing the energy consumption of variable voltage processors executing real time dependent tasks. This method is implemented as a two phase approach which integrates
In the first phase, opportunities for energy minimization are revealed by ordering real-time dependent tasks and assigning them to processors on the respective target platform.
Once the scheduling is created, there will be time frames between the end of one task and the start of another during which the processor is not being utilized (so called slacks). These slacks the prerequisites for the second phase, which performs the voltage selection. This phase aims at determining the resp. (optimal) processor voltage for each of its task executions without harming the constraints and eventually minimizing the total energy consumption of the system. In order to determine these voltages, the task scheduling is transformed into a directed acyclic graph (DAG) that is used to model the selection problem as integer programming (IP) problem. Once the model has been set up, it is optimized by a mathematical solver.
This algorithm has been implemented with two constraints:
Table 3 lists the minimal amount of information which has to be present in the input models in order for this mapping strategy to work as well the special annotations which are added to the mapping model.
| Source Model | Element | Description |
|---|---|---|
| HW |
Core
|
A
Core represents the target of an allocation. One OS Model with a
Scheduler for each
Core will be generated.
|
CoreType
Prescaler
Quartz
|
A cores
Prescaler, the referenced
Quartz and the
CoreTypes attribute
CyclesPerTick of a
Core are used to determine the number of processed
Instructions per second.
|
|
The
CoreTypes contained
CustomProperty (
DoubleValue) starting with the label
EnEf-Volt_{SomeID} and
EnEf-Scale_{SomeID} are used to specify the voltage levels, i.e. the performance of a core during a specific voltage.
|
||
| SW |
Runnable
|
Runnables will be distributed to the cores (over the cores
Scheduler), their attribute
Instructions is used during the load calculation for each
Core
|
Activation (
PeriodicActivation)
|
The
PeriodicActivation specifies the recurrence of the
Runnable. The lowest recurrence is used to specify the overall deadline of all
Runnables, i.e. the max amount of time for the sum of all
Runnable executions.
|
|
| Constraints |
RunnableEntityGroup
RunnableSequencingConstraint
|
Are used to determine the executional order of the
Runnables as well as their interdependencies
|
| Mapping |
RunnableAllocation
CustomProperty
LongValue
|
Specifies the selected voltage level and the number of
ExecutionCycles at this voltage level.
|
Table 3
h3sec:mapUtilization). Utilization of the OpenMapping Plugin
This section provides information on the utilization of the AMALTHEA Mapping Plugin, i.e. its configuration (section Configuration and Preferences ) and how to generate mappings (section Generating a mapping ).
The configuration of OpenMapping Plugin can be performed through its preferences page. It is integrated into APP4MC and can be accessed through the menu bar under ‘Window’ -> ‘Preferences’ -> ‘AMALTHEA Mapping’. The configurable fields, their types and their descriptions are listed below.
Checking the box 'Enable verbose logging to console' will enable verbose logging to stdout. This may help to identify problems if the mapping plugin should fail to generate a mapping.
The radio buttons under 'Select output location' allow to customize the directory which where newly generated files will be placed into.
Hint: It should be noted, that using this option will NOT update the project explorers folder list once the mapping is finished. It should be avoided to use this option in combination with a target location within the eclipse workspace.
The radio buttons within 'Select mapping algorithm' allow to customize the mapping strategy which should be applied during the mapping process. Currently, there are three valid options:
Hint: The settings described in this section only affect ILP based algorithms!
The section Solver Settings allows to configure the solver which is used to approximate the ILP problems, specify the minimal accuracy of the found solution and activate the MPS file output of the - ready to solve - ILP problem.
Setting this value to 0.0 will order to solver to continue until either the final solution reaches the same value as the LP relaxion or another limit (below) has been reached while 1.0 will consider the first feasible solution being optimal.
Valid values: 0.0 – 1.0
Furthermore, it is possible to specify the maximum number of iterations or time spend on finding an optimal solution.
Setting one of these values to zero will pass the value of INT_MAX to the solver, technically removing the respective constraint.
Depending on the selected mapping strategy, it may be required to create tasks in advance of the mapping algorithm. The method 'Create Tasks', which is accessible through the AMALTHEA Software Models file context menu (right click on *.amxmi and *.amxmi-sw files), is capable of transforming partitioned software models into software models with tasks.
The mapping can be performed once input models with the required amount of information are present. Opening the context menu again (right click on *.amxmi and *.amxmi-sw files) and selecting 'Perform Mapping' will open the ‘Perform Mapping GUI’.
The fields within the GUI are described below.
1 Oj!Algorithms, licensed under the MIT license, see: http://ojalgo.org