A hybrid evolutionary approach for heterogeneous

Transcription

A hybrid evolutionary approach for heterogeneous
Soft Comput
DOI 10.1007/s00500-008-0356-2
FOCUS
A hybrid evolutionary approach for heterogeneous multiprocessor
scheduling
C. K. Goh · E. J. Teoh · K. C. Tan
© Springer-Verlag 2008
Abstract This article investigates the assignment of tasks
with interdependencies in a heterogeneous multiprocessor
environment; specific to this problem, task execution time
varies depending on the nature of the tasks as well as with
the processing element assigned. The solution to this heterogeneous multiprocessor scheduling problem involves the
optimization of complete task assignments and processing
order between the assigned processors to arrive at a minimum makespan, subject to a precedence constraint. To solve
an NP-hard combinatorial optimization problem, as is typified by this problem, this paper presents a hybrid evolutionary algorithm that incorporates two local search heuristics,
which exploit the intrinsic structure of the solution, as well as
through the use of specialized genetic operators to promote
exploration of the search space. The effectiveness and contribution of the proposed features are subsequently validated
on a set of benchmark problems characterized by different
degrees of communication times, task, and processor heterogeneities. Preliminary results from simulations demonstrate
the effectiveness of the proposed algorithm in finding useful
schedule sets based on the set of new benchmark problems.
Keywords Multiprocessor scheduling · Heterogeneous ·
Hybrid evolutionary algorithm · Local search · Precedence
C. K. Goh (B)
Spintronics, Media and Interface Division,
Data Storage Institute, DSI Building, 5 Engineering Drive 1,
Singapore 117608, Singapore
e-mail: [email protected]
E. J. Teoh · K. C. Tan
Department of Electrical and Computer Engineering,
National University of Singapore, 4 Engineering Drive 3,
Singapore 117576, Singapore
1 Introduction
The multiprocessor scheduling problem is a broad category
of a class of combinatorial optimization problems in which an
originally large problem is broken down into smaller tasks.
These smaller, partitioned tasks then require suitable assignment to the individual processing units of a multiprocessor system or processing elements (PE) to be solved. To
obtain solutions for optimal schedules in such systems, it has
been shown that the problem is NP-hard for the general case
(Garey and Johnson 1979; Kasahara and Narita 1984; Lewis
and El-Rewini 1992; Papadimitriou and Yannakakis 1990).
The underlying motivation for this problem is quite significant, considering the emergence of computer programs with
increasingly higher computational requirements and algorithmic complexity. These factors have necessitated the need
for parallel PE in a multi-computer environment, which in
turn has seen the increasing need for task allocation to be
‘optimally’ distributed in a suitable manner to these individual processing units.
A typical program can usually be decomposed into a set
of smaller tasks, similar to a divide-and-conquer approach.
These smaller tasks almost always have dependencies, and
hence precedence requirements in that the results of another set of tasks are required before a particular task can be
executed. The critical aim of a scheduler is thus to assign partitioned tasks to available processors in a manner such that
(1) the requirements (or constraints) of precedence between
these tasks are met and (2) the resulting overall length of time
required to execute the entire program, the schedule length
or makespan, is minimized (Wu et al. 2004). To complicate
matters, the scheduling of tasks becomes more challenging
when communication delays are accounted for.
A multiprocessor scheduling problem can be categorized into different classes based on the characteristics of the
123
C. K. Goh et al.
problem, the tasks to be scheduled, the multiprocessor
system, as well as the availability of a priori information
regarding the processing time (El-Rewini et al. 1994; Kwok
and Ahmad 1997, 1999). Typically the PE constituting a
multi-computer environment can be of the same capability
(this is known as a homogenous environment) or of a different
capability (this is known as a heterogeneous environment)—
this paper is focused on the latter.
Presently, there are numerous methods and approaches
which have been developed and subsequently applied to the
multiprocessor scheduling problem, typically using a deterministic approach. El-Rewini et al. (1994) provides a fairly
comprehensive taxonomy of how scheduling problems can
be categorized, and highlights the key differences that distinguishes one class from the next. Further to this, in Kwok and
Ahmad (1997, 1999), present a wide-ranging overview and
classification of scheduling algorithms, particularly focusing
on deterministic and static scheduling problems. Most of the
present techniques are based on heuristics (Kruatrachue and
Lewis 1987; Macey and Zomaya 1998) that are not only
greedy in nature but also capable of solving certain instances
of the scheduling problem efficiently.
With that in mind, the approach proposed here is largely inspired by developments in computational intelligence:
evolutionary algorithms (EAs) are a class of stochastic global optimization techniques that has been gaining significant
attention from researchers in many fields and it has also been
applied to solve the heterogenous multiprocessor scheduling
optimization problem (Ritchie and Levine 2004; Zhong et al.
2004). While EAs are excellent global search algorithms, it
is known that they can take a relatively long time to locate
the local optimum in the region of convergence (Ong et al.
2006). On the other hand, local search heuristics are capable
of locating the optimum quickly but are prone to local optimal traps. Therefore, EAs are often hybridized with local
search heuristics to maintain a balance between exploration
and exploitation, which is crucial to the success of search
and optimization processes (Burke et al. 2001; Franca et al.
2001; Ishibuchi et al. 2003; Merz and Freisleben 2000; Ong
and Keane 2004; Tang et al. 2007; Zhou et al. 2007). Multiprocessor systems have also been exploited to improve EA
performance (Lim et al. 2007).
This paper attempts to present a new hybrid evolutionary algorithm (HEA) for solving the above heterogeneous
multiprocessor scheduling problem. The proposed algorithm
incorporates two local search operators, based on firstly, list
scheduling and secondly, task duplication; both methods
attempt to exploit the intrinsic structure of the scheduling
problem. Unlike existing evolutionary approaches used to
solve the heterogeneous multiprocessor scheduling problem,
the proposed HEA also implements a variable length chromosome which preserves the precedence relations, a PE schedule
crossover which facilitates the exchange of good schedules
123
assigned to the individual processors as well as specialized
mutation operators to improve the diversity of the evolving
population.
This paper is organized as follows: Sect. 2 gives an overview of existing works as well as the problem formulation
of the heterogeneous multiprocessor scheduling problem.
Section 3 presents the various features of the proposed HEA
including the local search heuristics and specialized genetic
operators as well as the algorithmic flow. Section 4 presents
the extensive simulation results and analysis of the proposed
algorithm. Conclusions are then drawn in Sect. 5.
2 Background information
2.1 Overview of existing works
Multiprocessor scheduling based on methods motivated by
evolutionary computation approaches have been the focus of
many research works over the last decade. Here we offer a
brief, non-exhaustive overview of similar works that
have motivated our interests and research. Ahmad and Kwok
(1998) proposed a task duplication approach (together with
a review and comparison of some similar algorithms) to
mitigate the expensive communication overhead in interprocessor communications that is required when executing
dependent tasks on multiple processors. In a similar manner, (Baskiyar and Dickinson 2005) addresses static scheduling of a directed a-cyclic task graph (DAG) on a heterogeneous, bounded set of distributed processors to minimize the
makespan, also based on a task duplication approach.
Most of the present techniques are based on heuristics that
are capable of solving only certain instances of the scheduling
problem efficiently. However, the scheduling of tasks with
communication overheads and dependencies are gaining
increasing attention from researchers. Here, we investigate
an alternative paradigm, based on biologically inspired algorithms, to efficiently solve the scheduling problem without
the need to apply any restricting assumptions. Aside from the
above, other works in the literature have used EAs to determine task priorities based on list scheduling techniques. List
scheduling heuristic (LSH) is an approach involving the assignment of a priority to each task to be scheduled within a list,
which is then subsequently sorted in decreasing task priority.
The task with the highest priority in the unscheduled task list
is typically assigned to the first available processor and then
removed from the list. If there are more than one task being
assigned the same priority level, selection from among the
candidate tasks is typically done randomly. This conventional
approach will be applied in the comparative study conducted
in this paper.
On the other hand, an alternative approach would be to
use EAs to directly evolve task assignment and order in
A hybrid evolutionary approach for HMPS
processors. Hou et al. (1994) used an EA to evolve candidate solutions, or individuals that in turn consist of multiple
lists, with each list representing the tasks assigned to one
processor; the authors restrict the explorable design space in
order to avoid invalid solutions. However in their proposed
approach, the authors consider only homogeneous multiprocessor systems. Consequently, the crossover operation then
exchanges tasks between corresponding processors from two
different individuals, after which the mutation operator then
exchanges these tasks within a single individual. Overall,
this approach restricts the actions of genetic operators to
ensure the validity of evolved individuals. However, such an
approach would mean that some parts of the search space may
be unreachable by the algorithm. Correa et al. (1999) subsequently claims to improve upon Hou’s original approach to
circumvent this problem, and allow the entire search space
to be explored.
In Kwok and Ahmad (1997), proposed a coarse-grained
parallel genetic algorithm (GA) together with a heuristical list
scheduling method, where candidate solutions are vectors of
length n, with n being the number of tasks to be scheduled. The elements of a vector represent the tasks themselves
and the order of the tasks gives the relative task priorities.
A number of order-based crossover operators are presented
and a mutation operator is used to perform random swapping
of tasks.
In Dhodi et al. (1995), proposed a “Problem Space Genetic Algorithm” (PSGA) for datapath synthesis. The problem
itself is modified by the EA and subsequently transformed
into solution space by means of a heuristic, thus avoiding
infeasible solutions. Blickle et al. (1996) use an EA to perform allocation and binding on a system level. Scheduling is
achieved in a separate step. The authors use multichromosomal individuals to encode the problem and to subsequently
guide repair heuristics in parallel. Tsuchiya et al. (1998) proposed an approach in which a GA scheduler allows task
duplication where a single task may be assigned to multiple
processors. Alternatively, Zomaya et al. (1999) incorporate
heuristics in the generation of the initial population of an EA
and perform a thorough study of how GA performance varies
with changing parameter settings. Wu et al. (2004) claims that
an EA-based approach achieves good performance on most
of the problems applied. They also suggest that GAs appear
to be the most flexible algorithm for heterogeneous systems because heterogeneous processors make it more difficult for list scheduling algorithms to accurately estimate task
priority.
An alternative approach, motivated by ant colony optimization (ACO), is developed by Ritchie and Levine (2004).
When combined with local and tabu search, the ACO-based
algorithm is able to find shorter schedules on a few benchmark problems. ACO, as the authors also claim, has been
shown to be a successful strategy for problems related to
scheduling jobs in a heterogeneous computing environment.
This approach was only tested in solving a scheduling problem in a static environment for independent jobs.
2.2 Heterogeneous multiprocessor scheduling problem
Technological advancements have led to the development of
large scale parallel and distributed systems for a large range
of applications. However, applications are only able to exploit
parallelism when their parts do not wait for data longer than
necessary. This necessitates appropriate scheduling strategies, which are able to control access to processing resources,
as well as scheduling strategies, which control execution of
these parallel application modules. Thus, it is not surprising
that the focus of research in this area has been on the efficiency and effectiveness of scheduling algorithms. There are
increasing concerns that comparative studies performed are
not adequate to evaluate the true abilities of the algorithms
under test. Addressing the issue of data set generation, Hall
and Posner (2001) presented a set of guidelines on how data
sets should be generated for the evaluation of the various
scheduling algorithms. In order to generate a set of good test
problems, the researcher must consider:
1.
2.
3.
4.
the purpose of the experiment,
tests performed should be comparable,
unintended bias that can skew the test results, and
the reproducibility of the generation scheme.
Further, Hall and Posner also state that the generation scheme
should have properties such as variety, practical relevance,
scale and size invariance, regularity, describability, efficiency,
and parsimony. Kwok and Ahmad (1999) presented a suite
of five different benchmark graphs. The proposed sets are
peer set graphs (PSG), random graphs with optimal solutions
using branch-and-bound (RGBOS), random graphs with predetermined optimal schedules (RGPOS), random graphs with
no known optimal schedules (RGNOS), and traced graphs
(TG). RGPOS is probably the most interesting set of task
graphs in the sense that they are generated based on a set
of pre-determined solutions. To our knowledge, this is the
first instance of such generation scheme for a multiprocessor
scheduling problem with communication delay. RGBOS also
have a set of optimal solutions, which are determined using
the A∗ algorithm (Ahmad and Kwok 1998). The A∗ algorithm is a search heuristic, which incrementally searches all
paths from the starting point until it finds the shortest path
to a goal. PSG is a collection of task graphs used by various
researchers. RGNOS consists of large scale randomly generated task graphs while TG represent real-world applications.
Coll et al. (2002) considered the issue of generating benchmark test sets for heterogeneous systems. The degree of
heterogeneity between different processors is defined by a
123
C. K. Goh et al.
processor power ratio (PPR), which represents the relative
speeds between processors. In addition, they considered the
different precedence relationships based on the specific
nature of the task to be processed.
More recently, Davidovic and Crainic (2003) proposed
a set of benchmark problems modeling homogeneous systems with communication delays. Based on the criteria proposed by Hall and Posner (2001), they proposed two sets of
task graphs. Similar to Kwok and Ahmad (1999), one of the
proposed sets is generated based on some pre-determined
desired solution. However, Davidovic and Crainic provide a
much higher degree of control, allowing parameters such as
dependency densities to be changed.
2.2.1 Problem formulation
The multiprocessor scheduling problem can be simply stated
as follows:
Assuming there are n tasks that have to be executed
on m processors—where and when should each task
be executed, such that some performance measure(s)
is (are) optimized?
The task of the scheduling algorithm is to ultimately minimize a given cost function of time. The objective function
used in this paper is defined as:
F = min
max T f (vi )
i=1,...,n
(1)
where T f (vi ) denotes the time for the complete execution of
task vi . The goal of task assignment/mapping is to determine
an assignment of tasks to processors and an order in which
tasks are executed to optimize some performance measures.
Often, the assignment process should aim to minimize the
total cost of executing the programs. An optimal assignment
determines both the allocation (identifying specific processor
to run certain modules) and the schedule (execution order)
of each task.
A task in turn, is a collection of instructions, procedures
or subroutines, possibly together with some data. Each task
is assumed to be immutable. While distributing the tasks to
parallel PEs is not difficult, introducing dependencies between the tasks causes degradation of the overall system performance. There are bindings or linkages between some pairs
of tasks (we call these dependencies) since a procedure in one
task may wish to (1) transfer control to another procedure in
a different task or (2) access data contained/produced in a
different task. It should be noted that these tasks only incur
a communication delay when they are assigned to different
PEs. It is, thus, important to make the assumption that the
cost of executing tasks on different processors and the cost
of the communication delay are known in advance.
123
As to why deterministic scheduling is considered, we are
inclined to believe that a priori efforts must be devoted to
analyze data from machine manufacturers and accumulate
actual experience from running smaller programs on fewer
processors. Such efforts are fully justified especially if repeated deterministic or production runs of important large programs will be run on large parallel systems where termination
and successful results are expected. In fact, it is precisely for
these production runs that the effort of optimizing assignment
is justified in the first place.
The duration of each task is known as well as precedence
relations among tasks, i.e. which tasks should be completed
before some others can begin. In addition, if dependent tasks
are executed on different processors, data transferring times
or communication delays that are given in advance are also
considered. These latencies also include memory access and
synchronization delays. To further include realism into our
problem model, we also consider a heterogeneous system,
that is a multiprocessor environment consisting of processors with different capabilities. Moreover, we only consider
a non-preemptive system, that is, each PE will complete the
processing of each task that is assigned to it. Essentially, this
means that PEs will not suspend its processing to take on
another task.
2.2.2 Problem generator
In order to verify the efficacy of our proposed approach, a
set of problems are needed for the experimental study. This
is achieved via the construction of a benchmark problem
generator, which produces a representative problem of a certain complexity based upon a set of input parameters. These
test problems are in turn used as the input problem to the
task scheduler. Having said that, there are four key components in a task scheduler: the parallel program of interrelated
tasks, the target machine (model), the generated schedule,
and the performance criterion. Previous works on task scheduling with dependencies usually use a graph representation
for either the tasks of the parallel program and the computer
model, or both. In an actual multiprocessor computing system, particularly those consisting of heterogeneous elements,
the running time of a particular job is not the sole or primary
factor to be considered when scheduling jobs. An equally
important consideration is the time that it takes to migrate
the executables and its associated data from one processor to
the next.
Braun et al. (2001) defined three types of heterogeneity:
task heterogeneity, machine heterogeneity and consistency.
Task heterogeneity is defined as the amount of variance
possible among the execution times of the jobs. Machine
heterogeneity, on the other hand, represents the variation of
the running time of a particular job across the processors.
Lastly, consistency can be categorized as either: consistent,
A hybrid evolutionary approach for HMPS
inconsistent and semi-consistent. A system is said to be
consistent if for a processor A that executes a job C faster than
another processor B, then A will execute all other jobs faster
than B. A consistent system can therefore be seen as modeling a heterogeneous system in which the processors differ
only in their processing speed. A semi-inconsistent system
is made of elements from both consistent and inconsistent
systems.
Higher degrees of machine heterogeneity increase the
complexity of the multiprocessor scheduling problem. This
is because the scheduling algorithm now needs to account for
the variation in the individual processor’s capabilities, in that
certain processors might be more suitable for certain tasks
due to hardware or software configurations and compatibility. The multiprocessor system is made up of m processors
with their own local memories. The system can have various
degree of heterogeneity and the processors are connected via
bi-directional links of equal capacity. Each processor has an
I/O unit that allows for communication and processing to
be performed simultaneously. We assume that there are no
start-up costs for initiating each task and that input buffers
have infinite capacity.
A convenient representation for the partially ordered set of
tasks is a directed acyclic graph (DAG), which is also known
as a task (dependency) graph, where a directed edge e( p, j)
between two tasks v p and vi specifies that task v p must be
completed before vi can begin. These directed edges in a
DAG correspond to the communication messages as well as
precedence constraints between the tasks. We consider a node
and a task to be equivalent. A task is a set of instructions
that must be executed sequentially in the same processor.
They are considered to be the smallest possible instruction set
that cannot be broken up any further. Mathematically, node
v p is a predecessor of node vi if a directed edge originates
from v p and ends at vi . In a similar manner, node vs is a
successor of node vi if a directed edge originating from vi
and ending at vs exists. From a mathematical perspective, for
any vertex v in the DAG, there is no non-empty directed path
that starts and ends on v—as such, for our multiprocessor
task scheduling problem, DAGs are quite ideal models since
it is not tractable for a vertex to have a path to itself; for
example, if an edge v p → vi indicates that vi is a part of v p ,
such a path would indicate that v p is a part of itself, which is
impossible.
The test sets that were artificially generated using our
benchmark problem generator is based on this concept of
DAGs. From a practical viewpoint, actual multiprocessor
systems are immensely complicated combinations of hardware, software and network components and thus it is difficult
to make equitable comparisons of the different approaches
that have been used on various systems. In constructing these
problems artificially, and in a random manner, the input
variables essentially controls not only the size, but also the
Table 1 Description of inputs to task generator
Parameter
Description
Values
CCR
Communication-to-computation ratio
{0.5,1,1.5,2}
Meanproc
Mean processing time
{10}
h pe
Variance of processing time
{0.25,0.5,0.75}
ht
Degree of heterogeneity
{0.25,0.5,0.75}
dpe
Width of DAG
{0.5}
dt
Degree of dependency
{0.25,0.5,0.75}
n
Number of processors
{15}
m
Number of tasks
{100}
complexity of the generated test set. Specifically, these
variables are:
1.
2.
3.
4.
5.
6.
7.
8.
the number of nodes/tasks,
the number of processors available,
the degree of network connectivity,
the communication-to-computation ratio—average communication cost divided by its average computation cost
in a multiprocessor system. A low CCR in a DAG can be
considered as a computation-intensive application; on
the other hand if CCR is high, it is a communicationintensive application,
the mean processing time—the average processing time
for all the available processors,
the variance of processing time—how large the spread
of processing time between the available processors,
the degree of heterogeneity—how widely differing the
capabilities of the processors are, i.e. processors have
different execution time on same the task,
the degree of precedence/dependency relationship—how
many predecessor tasks that must be completed before a
particular task can be executed.
Having said that, the generator produces different test sets
for a given set of input parameters. For similar set of parameter, different task problems are generated due to randomness.
The input parameters to the generator are shown in Table 1,
together with the associated range of values. The variance in
the processing times of the difference tasks comes from h pe ,
i.e the mean processing time of the ith task is given by
Tmproc (vi ) = meanproc + h t · meanproc · U (−1, 1).
(2)
where U (−1, 1) denotes a random number sampled using
uniform distribution. As mentioned before, each task may
have different execution times on different processors. The
actual processing time of the ith task on the jth processor is
thus given by,
Tproc (vi , pe j ) = Tmproc (vi ) + h pe · meanproc · U (−1, 1).
(3)
123
C. K. Goh et al.
Table 2 Generated test sets
Test set
CCR
Meanproc
h pe
ht
dpe
dt
n
m
T1
0.5
10
0.25
0.25
0.5
0.5
15
100
T2
1
10
0.25
0.25
0.5
0.5
15
100
T3
1.5
10
0.25
0.25
0.5
0.5
15
100
T4
2
10
0.25
0.25
0.5
0.5
15
100
T5
1
10
0.25
0.25
0.5
0.25
15
100
T6
1
10
0.25
0.25
0.5
0.75
15
100
T7
1
10
0.5
0.25
0.5
0.5
15
100
T8
1
10
0.75
0.25
0.5
0.5
15
100
T9
1
10
0.25
0.5
0.5
0.5
15
100
T10
1
10
0.25
0.75
0.5
0.5
15
100
Using these inputs for the benchmark problem generator, sets
of random DAGs were constructed to be used as the test bed
problems in our experimental study. For our simulation study,
ten test sets were generated using various combination of the
above input parameters, and are listed in Table 2.
While the standard multiprocessor scheduling problem is
itself an NP-hard problem, additional factors such as communication delays and heterogeneity increase the complexity
of the problem. Hence, due to the sheer number of potential
solutions in the search space, scheduling becomes a complex task without the use of an effective search algorithm.
These sets are classified in terms of the possible difficulties.
Each test set consists of different test problems with different degrees of heterogeneity and dependencies. Here, we
consider a total of ten test sets generated in this study, which
differs in terms of degree of heterogeneity, density, and CCR.
A higher CCR value penalizes dependencies which require
transmission or passing of messages from one processor to
the next, making it less optimal for inter-processor communication to occur. The variance of processing time and degree
of heterogeneity affects the individual processing capabilities of each processor, thus making ‘slower’ processors less
likely to be assigned tasks, and biasing the utility of ‘faster’
processors. Lastly, the degree of dependency affects the total
latency of the makespan in that each processor would have
to ‘wait’ for its dependent tasks to finish execution.
3 Hybrid evolutionary algorithm
This section presents the HEA specifically designed to solve
the heterogeneous multiprocessor scheduling problem by
means of specialized genetic and local search operators. The
procedure for generating the initial population is presented
in Sect. 3.1 while Sect. 3.2 describes the structure of the
variable-length chromosome used to encode the task schedule
in the HEA. Sections 3.3 and 3.4 describe the specialized
123
crossover and mutation operators used to explore the search
space, respectively. Two local search heuristics that exploit
the intrinsic structures of a heterogeneous multiprocessor
scheduling problem solution are presented in Sect. 3.5.
Finally, the algorithmic flow of the HEA is presented in
Sect. 3.6.
3.1 Initialization
The initial population is built using a random LSH, which
ensures that the precedence relationships among the tasks
are preserved. The initialization process starts with the assignment of priority to each task to be scheduled. In this paper,
the priority of the ith task is simply the sum of the number
of its parent tasks and their priorities as given below
PrT j
(4)
PrTi = |Pi | +
j∈|Pi |
where Pi is the set of parent tasks of the ith task.
The list of task is then sorted in the order of increasing
priority. This priority list is also used during the genetic processes to maintain the precedence requirements. Instead of
assigning the tasks to the earliest available PE, the lowest
priority task is assigned to the PEs randomly. The rationale
is to provide the initial population with a wider range of
diversity to start with.
3.2 Variable PE chromosome
Evolutionary algorithms operates on a set of encoded parameters to explore the solution space, providing researchers
with the flexibility to design an appropriate representation
that fulfills some criteria such as ease of implementation
or exploitation of the problem structure. For simplicity, the
chromosome is often represented as a fixed-structure and the
embedded variables are usually assumed to be independent
and context insensitive. As mentioned before, the precedence
relations among the tasks must be satisfied in the heterogeneous multiprocessor scheduling problem. In Braun et al.
(2001); Ritchie and Levine (2004), the chromosome is a
n-dimensional array denoting the n tasks to be allocated and
the encoded variable in each element represents the PE scheduled to execute the associated task. While such an encoding scheme is simple to implement, it does not consider the
order in which the various tasks are processed and the evolved schedules will not satisfy the precedence constraints. On
the other hand, Wu et al. (2004) considered a representation
which encodes task-processor pairs and the order in which
the pairs appear in the chromosome determines the order in
which the tasks will be performed on each processor.
This paper adopts a variable length chromosome which is
illustrated in Fig. 1. In contrast to the mentioned works, this
encoding scheme does not enforce a fixed number of PEs, i.e.
A hybrid evolutionary approach for HMPS
Fig. 1 Illustration of a the
variable length chromosome and
b the associated schedule
(a)
PEs used in the encoded solution
PE1
PE3
PE4
(b)
PE6
Processors
1
5
10
14
2
6
11
15
3
7
12
4
8
13
PE1
1
2
3
4
PE3
5
6
7
8
PE4
10
11
12
13
14
15
PE2
9
Tasks to be executed by
the associated PE
PE5
Tasks
9
PE Schedule
the length of the chromosome varies with the actual number
of PE utilized. For each of these PEs, there is an associated
list of task assigned as well as the order of execution. Each of
the task list will henceforth be denoted as PE schedule. When
a task is scheduled to run before its predecessor tasks, which
have been assigned to other PEs, the only problem is the long
idle time incurred while waiting for all the predecessor tasks
to be completed. On the other hand, if a task is scheduled to
run before its predecessor tasks on the same PEs, then there is
no way the task will ever be completed. The overall schedule
is infeasible only if a task is scheduled to be executed before
a predecessor task within a PE. This follows that it is sufficient to maintain a feasible overall schedule by ensuring the
feasibility of each PE schedule. The precedence relations for
the tasks executed in a PE can be easily preserved in the proposed scheme by maintaining the order of priority calculated
at the beginning of the optimization process.
3.3 PE schedule crossover
The crossover operation applied by most EAs to solve heterogeneous multiprocessor scheduling problem generally
involve the swapping of random segments of tasks or processes between chromosomes, which do not preserve the
quality of the different PE schedules. Descriptions of a
number of ordered-based crossovers for combinatorial problems can also be found in Davis (1991), Eiben and Smith
(2003). However, these crossover operators are not applicable due to the unique structure of the proposed variable
length chromosome.
The proposed PE schedule crossover is motivated by the
fact that the makespan of the multiprocessor schedule is
dependent on the fitness of the constituent PE schedules.
Since the chromosome encodes a separate list of tasks for
each PE, it is intuitive to design a crossover which allows
good PE schedules to be shared with other chromosomes in
the evolving population. The operation of the crossover is
PE6
illustrated in Fig. 2. In the PE schedule crossover, a random
PE schedule from each parent is selected for crossover. In the
case where one of the selected chromosomes has only one
PE schedule, only a schedule associated with a different PE
is selected and inserted from the other parent. The selected
PE schedule of one parent will either be inserted into the
other chromosome as a new schedule or replaces the original schedule of that particular PE, if it is present. Duplicated
tasks are deleted while missing tasks are randomly inserted
to the other original PE schedules. The new PE schedule will
remain intact. To ensure the feasibility of chromosomes after
the crossover, the priority list computed at the beginning of
the evolutionary process is used to sort the task assigned to
each PE in ascending order to preserve feasibility.
3.4 Specialized mutation
This paper applies three different specialized mutation operators to improve the diversity of evolving population. For
every chromosome undergoing the mutation process, only
one particular mutation operator is applied as shown by the
pseudocode in Fig. 3. The main functionalities of the three
mutation operators are summarized in Table 3. Similar to the
PE Schedule crossover, each PE schedule is sorted based on
the priority list at the end of the mutation operation.
3.5 Local search
3.5.1 Partial list scheduling
The optimality of the multiprocessor schedule is only as good
as the last completion time of the task. The idea of partial
list scheduling (PLS) is to split up the workload among the
PEs with the best and worst completion times to improve the
makespan. The first step in this heuristic is to select the appropriate PEs from which all tasks are extracted and placed in a
list. These PEs are selected based on two criteria, either the
123
C. K. Goh et al.
(a)
(b)
Parent 2
Parent 1
Child 1
Child 2
PE1
PE3
PE4
PE6
PE1
PE5
PE6
PE1
PE3
PE4
PE5
PE6
PE1
PE5
PE6
1
2
6
12
7
2
1
1
2
6
2
12
7
2
12
3
4
8
14
8
6
3
3
4
8
6
14
8
6
14
5
10
9
10
9
4
5
10
9
9
10
9
15
11
5
13
11
13
11
15
11
7
11
7
13
12
15
15
13
13
14
14
14
(c)
Child 1
Child 2
PE1
PE3
PE4
PE5
PE6
PE1
PE5
PE6
1
4
8
2
12
1
2
3
3
10
6
5
6
4
5
15
9
7
9
12
11
8
11
14
13
10
13
14
15
7
Fig. 2 Illustration of the PE schedule crossover for the various steps a selection of random PE schedule, b swapping of selected PE schedules,
and c deletion of duplicates and random insertion of missing tasks to form child chromosome
Mutation Operation
rand < mutation rate
IF
Select one mutation operator with equal probability
Partial Exchange AND No. of PE Schedules > 1
IF
Perform Partial Exchange
ELSEIF Schedule Merge AND No. of PE Schedules > 1
Perform Schedule Merge
ELSEIF Partial Split AND No. of PE Schedules < |PE|
Perform Partial Split
END
END
Sort task based on priority
END
Fig. 3 Pseudocode of the mutation operation
PE has a completion time that is greater than the upper
quartile or it’s completion time is lower than the lower quartile of the PE completion times. In the next step, the extracted
tasks are sorted based on their priorities determined at the
start of the evolutionary process. The tasks, in the order of
their priorities, are then assigned to the best possible processor, i.e. the one which allows the earliest start time considering inter-task communication (ITC). The new solution will
be compared against the original and the better of the two
will be retained.
123
3.5.2 Duplication scheduling
In multiprocessor scheduling with task interdependencies,
some PEs will be idle during various time slots because some
task require data from its parent tasks which are assigned to
other processors. The idea of duplicating tasks in these idle
time slots is to reduce the waiting and ITC delays incurred
to reduce the makespan. The pseudocode of the duplication
scheduling (DS) heuristic is shown in Fig. 4.
The task duplication procedure is conducted iteratively
every task in the order of its execution for each PE. The
heuristic first determines the idle time which is the difference
between the actual and earliest possible start time of the task.
It then attempts to duplicate the parent tasks, in the order of
their contribution to the delay, until the idle time is used up.
The new solution will be compared against the original and
the better of the two will be retained.
3.6 Algorithmic flow
The algorithmic flow of the HEA is shown in Fig. 5. The
optimization process begins with the initialization of the
A hybrid evolutionary approach for HMPS
Table 3 Description of the mutation operation
Operator
Description
Partial exchange
The partial exchange operation involves a number of partial schedule exchanges. For each exchange, two PE schedules are
randomly chosen and a segment of the selected schedules is then randomly selected and exchanged. In addition, a mechanism
is in place such that no PE schedules will be selected twice in a particular partial exchange operation
Schedule merge
This operation concatenates the two PE schedules with the least number of tasks in the chromosome. Intuitively, this operation
is not applicable to solutions with only one PE schedule
Partial split
This operation searches for the PE schedules with the most number of tasks, and breaks the schedule into two at a random
point. After which, the upper segment of the divided schedule is assigned randomly to either an idle PE or inserted into the
PE schedule with the least number of tasks
Duplication Scheduling Local Search
FOR All PE Schedules
FOR All Task in PE Schedule
Compute Tidle before task execution
Determine parents of task
Sort parents in descending order of completion time
FOR Parents
Determine Texe required if d uplicated
Execution time< Tidle
IF
Duplicate parent
Update Tidle: Tidle = Tidle - Texe
ELSE
Break
END
END
END
END
Sort task based on priority
Evaluate new solution
new solution is better than old solution
IF
Replace old solution
END
Fig. 4 Pseudocode of the duplication scheduling local search
Start
Build Initial
Population
YES
Stopping Criteria
met?
No
Evaluate and
Rank Solutions
Update Archive
Return
Solution
Evaluate and
Rank Solutions
Update Archive
Perform Local
Search
Local Search
Criteria met?
Yes
Tournament
Selection
PE Schedule
Crossover
No
Mutation
Fig. 5 Flowchart of HEA
population based on the procedure described in Sect. 3.1.
After the initial evolving population is formed, all the chromosomes are evaluated and ranked according to their final
execution time in the population. Following the ranking process, an archive population is updated. In this paper, an
archive is applied to store all the best solutions found during
the search. The archive maintains a fixed number of solutions
and the updating process consists of a few steps. The evolving
population and the archived solutions are first combined and
all duplicate solutions are deleted. The remaining solutions
in the combined population are then inserted into the archive
in the order of increasing rank until the archive is filled.
The binary tournament selection scheme is then performed on the archive. In the binary tournament selection, a pair
of individuals is selected randomly from the archive. Thereafter, the selected pair of individuals will enter a tournament
where the chromosome with the lower rank is selected for
reproduction. This procedure is performed until the mating
pool is filled to preserve the original population size. The
genetic operators consist of the PE schedule crossover and
the three mutation operators presented in Sects. 3.4 and 3.5,
respectively. The PLS and DS are applied to the archive populations at a fixed interval, TLS , for better local exploitation in
the evolutionary search. Different schemes for incorporating
the two local search methods will be explored in Sect. 4. The
evolution process is repeated until the stopping criterion is
satisfied.
4 Simulation results and analysis
This section presents the extensive simulation results and
analysis of the proposed HEA. The simulations are implemented using Matlab on an Intel Pentium 4 2.8 GHz computer
and the results shown are based on the final makespan value
of the best archived solution. Thirty independent runs are
performed for each of the test sets in order to obtain the statistical information, such as consistency and robustness of the
123
C. K. Goh et al.
Table 5 Different case setups to examine contribution of the local
search heuristics
Parameter
Settings
Populations
Population size 20
Archive size 20
Chromosome
Variable length chromosome
Selection
Binary tournament selection
Crossover rate
0.9
Mutation rate
0.3
Evaluations
600
Local search frequency, TLS
5
algorithms. The various parameter settings for the algorithm
are listed in Table 4. The number of evaluations includes the
evaluation of solutions from the main algorithmic cycle as
well as the local search operations. Section 4.1 demonstrates
the effectiveness of the proposed local search operators, as
well as analyzes how the various settings of the local search
heuristics will affect algorithmic performances. Section 4.3
investigates the impact of different problem characteristics
on HEA performances and how it compares against conventional heuristics.
4.1 Effects of local search
The HEA incorporates the local search heuristics in order to
exploit local schedules in parallel with global evolutionary
optimization. In this section, the dynamics and parameter
settings of PLS and DS are examined. Note that T1 and T4
are used in the study here since it has been observed in previous works that ITC will have severe impact on schedule
optimality.
Six settings of HEA with various implementations of the
local search operators are investigated as shown in Table 5.
No local search is applied in setup 1 while only one heuristic is applied for each solution undergoing local search for
setups 2–6. In setup 2, either PLS or DS is randomly applied.
(a) 350
Setup1
Setup2
Setup3
Setup4
Setup5
Setup6
340
Makespan
330
320
310
300
290
280
270
0
40
80
120 160 200 240 280 320 360 400
Evaluation
Fig. 6 Evolutionary trend of the six setups for a T1 and b T4
123
2
3
4
5
6
PLS
–
Random
Yes
–
Alternate∗
Alternate
DS
–
Random
–
Yes
Alternate
Alternate∗
In the third and fourth setup, only one heuristic is applied.
The asterisk (∗ ) in setup 5 and setup 6 denotes which local
search is activated first as they are alternately executed.
The evolutionary trends of the makespan averaged over
30 runs for T1 and T4 are plotted in Fig. 6a, b. From the
plots, it can be observed that the application of local search
results in significant dips in the convergence trace, particularly in instances where DS is applied. Figure 6a, b distinctively demonstrate the effectiveness of local exploitation in
the HEA as the five setups which incorporates local search
performed better as compared to setup 1. The performances
of setup 2, setup 4, setup 5, and setup 6 are comparable,
although the combination of DS being activated first and PLS
in setup 6 seems to have a slight edge for both problems. On
the other hand, setup 5 which activates PLS first has a slower
convergence rate for both T1 and T4.
The effectiveness of duplicating tasks in reducing overall completion time is also evident since the four settings of
setup 2, setup 3, setup 5 and setup 6, are able to find solutions with makespans that are significantly lower than those
found without local search and by PLS only. Interestingly,
the application of DS seems to have more impact on T1 with
an average of 10% improvement as compared to 5% for T4
which has a more severe CCR restriction. Setup 6 will be
used as the default setup for all subsequent experiments.
4.1.1 Effect of local search frequency
In general, there is a need to maintain a balance between
exploration and exploitation. Therefore, experiments are also
conducted to study the impact of local search frequency on
(b) 560
Setup1
Setup2
Setup3
Setup4
Setup5
Setup6
540
520
500
480
460
260
250
1
Makespan
Table 4 Parameter setting for HEA
440
0
40
80
120 160 200 240 280 320 360 400
Evaluation
A hybrid evolutionary approach for HMPS
(a) 360
(b) 580
1
3
5
7
10
340
1
3
5
7
10
560
540
Makespan
Makespan
320
300
520
500
280
480
260
240
460
0
50
100
150
200
250
300
350
440
400
0
50
100
150
200
250
300
350
400
Evaluation
Evaluation
Fig. 7 Effects of various TLS settings for a T1 and b T4
250
460
(a)
(b)
248
455
244
Makespan
Makespan
246
242
240
238
450
445
236
440
234
232
Archive only
Population only
Setups
Both
Archive only
Population only
Setups
Both
Fig. 8 Makespan of HEA with different individual local search selection schemes for a T1 and b T4
the performance of the HEA. Apart from the original setting
of applying local search at TLS = 5, four other settings where
local search is applied in every generation and at intervals of
TLS = {3, 7, 10} generations, respectively, are used in the
test. Thirty simulation runs of each of the five settings were
performed and the convergence traces are plotted in Fig. 7a,
b. It should be noted that the maximum number of evaluations is maintained at 400 for all simulations, i.e. increasing the frequency of local search reduces the number of
generations.
From the figures, it can be observed that the convergence
speed increases with decreasing TLS . While it is expected that
increasing local search frequency will improve convergence
speeds, there is always a risk of yielding local optimum solutions due to the lack of sufficient exploration. Nonetheless,
we note that a well-designed LSH is capable of achieving
schedules within 25% of the optimal solution. By comparing
the results achieved by the HEA and the conventional heuristics which will be shown in the Sect. 4.3, it is thus unlikely
that the HEA is trapped in a local optimal. This is probably
due to the global exploration capability of the HEA.
4.1.2 Effect of individual selection
Apart from TLS , another factor that will influence the effectiveness of the local search process is the selection of individuals. In the preceding sections, only archived individuals are
exploited by the heuristics. In this section, two other methods
of individual selection are investigated. In the second
approach, only individuals from the evolving population will
undergo local search while random individuals are selected
via tournament selection from the archive and evolving population in the third approach.
The simulation results are summarized in the form of boxplots in Fig. 8a, b. From the figures, it can be observed that
performing local search on archived solutions have an edge as
compared to exploiting the evolving population only. On the
other hand, the original approach of exploiting the archive
only is comparable to the third approach which exploits a
selected set of archived and evolving population individuals.
The KS-test is also compared and it showed that the only
the second method is statistically different from the other
methods.
123
C. K. Goh et al.
Table 6 Makespan of HEA with and without the various genetic operators for T1 and T4
HEA
T1
T4
First quartile
236.5115
440.6800
Median
238.9705
444.2914
Third quartile
241.9287
447.5509
First quartile
Crossover only
Mutation only
Median
236.5379
239.2379
Table 7 Simulation results of LSH, DSH and HEA for the various
benchmark problems
LSH
DSH
First
quartile
HEA
Median
Third
quartile
T1
241.2088
238.4844
236.5115
238.9705
241.9287
T2
339.3681
333.3969
313.3463
315.4760
317.0450
442.7701
T3
438.3704
400.8312
368.8242
371.2755
372.9756
445.0349
T4
496.6009
473.6011
440.6800
444.2914
447.5509
Third quartile
243.1408
447.935
T5
301.7023
299.9843
293.0909
295.1409
297.8713
First quartile
238.6333
443.9139
T6
342.1177
340.6002
319.5344
321.5004
323.6641
Median
241.7449
446.3875
T7
350.9543
307.5352
295.8357
298.9078
304.0759
449.0084
T8
322.0563
316.2526
275.6143
281.5325
284.9465
Third quartile
245.8042
The best result in highlighted in bold
T9
388.6536
371.4102
337.1911
342.0356
344.8998
T10
454.2243
415.2846
391.3507
395.1606
397.0869
4.2 Effects of genetic operators
This section examines the contribution of the specialized
genetic operators to the performance of HEA on the problems of T1 and T4. In order to assess the effects of the
PE schedule crossover and the mutation operators, simulations for two different setups of HEA are conducted. Specifically, the first setup incorporates only the PE schedule
crossover while only the mutation operator is implemented
in the second setup. The simulation results are summarized
in Table 6. The results indicate a deterioration of algorithmic
performance when either the crossover or mutation operator
is removed. Nonetheless, it can be observed that the crossover has a greater impact on algorithmic performance indicating the importance of exchanging PE schedules between
individuals. The KS-test conducted also showed that HEA is
statistically better than HEA with mutation operator only.
4.3 Investigation of other test problems
In order to examine the effectiveness of HEA, a comparative study with conventional LSH and duplication scheduling
heuristic (DSH) (Kruatrachue and Lewis 1987) is carried out
based upon the ten test problems described earlier. LSH has
been described earlier in Sect. 2.1. DSH is an instantiation
of the LSH with the task duplication described in Sect. 3.5.
Specifically, in DSH, parent task are assigned into idle slots
whenever possible after all tasks are assigned using LSH. As
before, 30 simulation runs are conducted for all test problems
and the results are summarized in Table 7. LSH and DSH are
deterministic heuristics and only one solution is produced for
each problem.
As noted before in Sect. 4.1, the effectiveness of task duplication is evident by comparing the performances between
LSH and DSH. The difference between the two conventional heuristics becomes even more apparent as the CCR or
degree of heterogeneity increases. On the other hand, the
123
HEA outperforms both heuristics for all test problems. With
the exception of T1, it can be observed from Table 7 that the
third quartile makespan value attained by HEA is much lower
as compared to LSH and DSH for the benchmark problems.
This also implies that the HEA is capable of evolving good
schedules consistently.
In order to analyze the impact of the various problem parameters, the performance trend over the different settings is
plotted in Fig. 9a, d. In general, increasing the degree of CCR,
precedence and task heterogeneity result in higher makespans for all algorithms. Nonetheless, it can be observed that
the problem and algorithmic performances have different
sensitivities toward these parameters. For instance, total execution time seems to vary almost linearly with CCR and task
heterogeneity. As CCR increases beyond a certain threshold,
we can expect that solutions which employ fewer PEs or, at
least, concentrate the workload on a few core PEs to become
more desirable.
On the other hand, the initial increment in the degree of
precedence relation from 25 to 50% leads to a sharp increase
in makespan. This is probably due to the subsequent increase
in waiting time before a task can be executed. However, it
can be seen from Fig. 9b that such an effect seems to saturate
as the degree of precedence is further increased to 75%.
Interestingly, we can observe from Fig. 9c that increasing PE heterogeneity actually improves the makespan. This
behavior can be attributed to the PEs that can be either very
efficient or inefficient with certain tasks. As a result, HEA
is able to exploit such a problem characteristic to generate
schedules that are much better compared to DSH and LSH.
5 Conclusion
Task scheduling in a multiprocessor system is an NP-hard
problem that is critical in distinguishing the performance of a
A hybrid evolutionary approach for HMPS
500
345
(a)
(b)
340
450
335
Makespan
Makespan
330
400
350
325
320
315
310
305
300
300
295
250
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
30
CCR
350
40
50
60
70
Degree of Precedence (%)
(c)
(d)
440
340
420
Makespan
Makespan
330
320
310
400
380
360
300
340
290
320
280
30
40
50
60
70
30
Degree of PE Het (%)
40
50
60
70
Degree of Task Het (%)
Fig. 9 Performance trend of HEA (open circle), LHS (open inverted triangle) and DHS (open square) for various degrees of a CCR (results for
T1, T2, T3 and T4), b precedence (results for T2, T5 and T6), c PE heterogeneity (results for T2, T7 and T8) and d task heterogeneity (results for
T2, T9 and T10)
multiprocessor system over a single processor system. However, the fact that tasks are located on different PEs mean that
additional overheads are incurred. Furthermore, the aim of
minimizing time is only a single objective—practical requirements require that other measures of cost are minimized as
well.
In this article, we proposed a HEA specifically designed
to solve the heterogeneous multiprocessor scheduling problem by means of a variable-length chromosome, as well as
specialized genetic and local search operators. The starting
population is initialized using a random LSH to preserve the
precedence relationships between the tasks. The evolutionary
process is driven two primary variation operator—a schedule crossover and three variants of the mutation operator—
partial exchange, schedule merge, and schedule split; the
local search operators on the other hand consists of a partial list scheduling and duplication scheduling approach.
In presenting our results based on a fairly extensive simulation study, we showed that, the proposed genetic operators, when coupled with the local search operators performed
better than in the case where any one of the operators were
omitted. By incorporating the local search operators, particularly duplication local search into the overall proposed algorithm, convergence time, as expected, was shown to decrease.
This observation becomes more evident on test sets where the
CCR is smaller.
References
Ahmad I, Kwok YK (1998) Optimal and near-optimal allocation of
precedence-constrained tasks to parallel processors: defying the
high complexity using effective search techniques. In: Proceedings of 1998 international conference on parallel processing,
pp 423–431
Ahmad I, Kwok YK (1998) On exploiting task duplication in parallel program scheduling. IEEE Trans Parallel Distrib Syst 9(9):
872–892
Baskiyar S, Dickinson C (2005) Scheduling directed a-cyclic task
graphs on a bounded set of heterogeneous processors using task
duplication. J Parallel Distrib Comput 65(8):911–921
Blickle T, Teich J, Thiele L (1996) System level synthesis using evolutionary algorithms, TIK-Report, Nr. 16
Braun TD, Siegel HJ, Beck N, Boloni LL, Maheswaran M, Reuther AI,
Robertson JP, Theys MD, Yao B, Hensgen D, Freund RF (2001)
123
C. K. Goh et al.
A comparison of eleven static heuristics for mapping a class of
independent tasks onto heterogeneous distributed computing systems. J Parallel Distrib Comput 61(6):810–837
Burke EK, Cowling P, De Causmaecker P (2001) A memetic approach
to the nurse rostering problem. Appl Intell 15(3):199–214
Coll PE, Ribeiro CC, de Sousa CC (2002) Test instances for scheduling
unrelated processors under precedence constraints. http://www-di.
inf.pucrio.br/celso/grupo/readme.ps
Correa RC, Ferreira A, Rebreyend P (1999) Scheduling multiprocessor
tasks with genetic algorithms. IEEE Trans Parallel Distrib Syst
10(8):825–837
Davidovic T, Crainic TG (2003) New benchmarks for static task scheduling on homogenous multiprocessor systems with communication
delays, Publication CRT, 2003-04, Centre de Recherche sur les
Transports, Universite de Montreal, pp 123–136
Davis L (1991) Handbook of genetic algorithms. Van Nostrand
Reinhold, London
Dhodi MK, Hielscher EH, Storer RH, Bhasker J (1995) Datapath synthesis using a problem space genetic algorithm. IEEE Trans CAD
14(8):934–944
Eiben AE, Smith JE (2003) Introduction to evolutionary computing.
Springer, New York
El-Rewini H, Lewis TG, Ali HH (1994) Task scheduling in parallel and
distributed systems. Prentice Hall, Englewood Cliffs
Franca PM, Mendes A, Moscato P (2001) A memetic algorithm for the
total tardiness single machine scheduling problem. Eur J Oper Res
132(1):224–242
Garey MR, Johnson DS (1979) Computers and intractability, a guide
to the theory of NP-completeness. W.H. Freeman and Co.,
San Francisco
Hall NG, Posner ME (2001) Generating experimental data for computational testing with machine scheduling applications. Oper Res
49:854–865
Hou ES, Ansari N, Ren H (1994) A genetic algorithm for multiprocessor scheduling. IEEE Trans Parallel Distrib Syst 5(2):113–120
Ishibuchi H, Yoshida T, Murata T (2003) Balance between genetic
search and local search in memetic algorithms for multiobjective permutation flowshop scheduling. IEEE Trans Evol Comput
7(2):204–223
Kasahara H, Narita S (1984) Practical multiprocessor scheduling algorithms for efficient parallel processing. IEEE Trans Comput
33(11):1023–1029
Kruatrachue B, Lewis TG (1987) Duplication scheduling heuristic, a
new precedence task scheduler for parallel systems, Technical
Report 87-60-3, Oregon State University
Kwok Y, Ahmad I (1997) Efficient scheduling of arbitrary task graphs
to multiprocessors using a parallel genetic algorithm. J Parallel
Distrib Comput 47(1):58–77
123
Kwok Y, Ahmad I (1999) Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput Surv
31(4):406–471
Lewis TG, El-Rewini H (1992) Introduction to parallel computing.
Prentice Hall, New York
Lim D, Ong YS, Jin Y, Sendhoff B, Lee BS (2007) Efficient hierarchical
parallel genetic algorithm using grid computing. In: Future generation computer systems: the international journal of grid computing:
theory, methods and applications, pp 658–670
Macey BS, Zomaya AY (1998) A performance evaluation of CP list
scheduling heuristics for communication intensive task graphs.
In: Proceedings of the joint 12th international parallel processing
symposium and ninth symposium on parallel and distributed programming, pp 538–541
Merz P, Freisleben B (2000) Fitness landscape analysis and memetic
algorithms for the quadratic assignment problem. IEEE Trans Evol
Comput 4(4):337–352
Ong YS, Keane AJ (2004) Meta-Lamarckian learning in memetic algorithms. IEEE Trans Evol Comput 8(2):99–110
Ong YS, Lim MH, Zhu N, Wong KW (2006) Classification of adaptive
memetic algorithms: a comparative study. IEEE Trans Syst Man
Cybern B 36(1):141–152
Papadimitriou C, Yannakakis M (1990) Toward an architecture independent analysis of parallel algorithms. SIAM J Comput 19:
322–328
Ritchie G, Levine J (2004) A hybrid ant algorithm for scheduling independent jobs in heterogeneous computing environments. In: Proceedings of the 23rd workshop of the UK planning and scheduling
special interest group
Tang J, Lim MH, Ong YS (2007) Diversity-adaptive parallel memetic
algorithm for solving large scale combinatorial optimization problems. Soft Comput 7(9):873–888
Tsuchiya T, Osada T, Kikuno T (1998) Genetic-based multiprocessor scheduling using task duplication. Microprocessors Microsyst
22:197–207
Wu AS, Yu H, Jin S, Lin KC, Schiavone G (2004) An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Trans
Parallel Distrib Syst 15(9):824–834
Zhou Z, Ong YS, Lim MH, Lee BS (2007) Memetic algorithm using
multi-surrogates for computationally expensive optimization problems. Soft Comput 11(10):957–972
Zhong YW, Yang JG, Qi HN (2004) A hybrid genetic algorithm for
task scheduling in heterogeneous computing systems. In: Proceedings of the third international conference on machine learning
and cybernetics, pp 2463–2468
Zomaya AY, Ward C, Macey B (1999) Genetic scheduling for parallel
processor systems: comparative studies and performance issues.
IEEE Trans Parallel Distrib Syst 10(8):795–812