You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We present a computational evaluation of existing and novel approaches
for the so-called Minimum Cost Linear Extension problem (MCLE). Given a
partially ordered set $\mathbf P$ and a cost function that associates a
non-negative real value with every (ordered) pair of incomparable
elements of $\mathbf P$, the MCLE asks for some linear extension
$\mathbf L$ of $\mathbf P$ that minimizes the total cost of adjacent
elements in $\mathbf L$. We discuss an algorithm by Liu et
al. for the general case of the MCLE, showing that it is not an
approximation algorithm, as originally suggested. We also propose a fast
and effective topological sorting-based heuristic, as well as an exact
model based on constraint programming. Computational results suggest
that the topological sorting heuristic is superior to the algorithm of
Liu et al. for sparse instances, while the exact model
shows that Liu et al.’s algorithm produces near-optimal
solutions for small-sized instances.
Introduction {#Intro}
A partially ordered set (or simply, a poset) is a pair
$\mathbf{P}=(E,\leq)$, where $E$ is a finite set and $\leq$ is a binary
relation on $E$ that is transitive, reflexive and antisymmetric, i.e.,
$\leq$ is a binary relation satisfying:
$x \leq y, ;; y \leq z ; \Rightarrow ; x \leq z, ;; \forall : x,y,z \in E$;
$x \leq x, ;; \forall : x \in E$;
$x \leq y, ;; y \leq x ; \Rightarrow ; x=y, ;; \forall : x,y \in E$.
In a partial order $\mathbf{P}=(E,\leq)$ there might exist pairs of
elements $x,y \in E$, with $x \neq y$, such that we have neither
$x \leq y$ nor $y \leq x$ in $\mathbf P$. Two elements $x,y$ of a poset
$\mathbf P$ are said to be comparable if $x \leq y$ or $y \leq x$ in
$\mathbf{P}$; otherwise, they are said to be incomparable, which is
denoted by $x \sim y$. We say that $y$covers$x$ in $\mathbf P$ if
$x < y$ and if there is no element $z$ in $\mathbf P$ satisfying
$x < z < y$. We denote the fact that $y$ covers $x$ by
$x <\hspace{-5pt}\cdot ; y$. For any element $x$ in ${\mathbf P}$, we
define $x^\downarrow$ as the subset of $E$ given by
$$x^\downarrow = \left{z \in E: z \leq x\right}.$$
The Hasse diagram of a finite poset $\mathbf{P}$ is a directed graph
whose vertices are the elements of $\mathbf{P}$ and the arcs are the
coverage relations in $\mathbf{P}$. A chain in $\mathbf{P}$ is a
subset $C \subseteq E$ such that any two elements of $C$ are comparable
in $\mathbf P$. A linear order is an ordered set
$\mathbf{L}=(E, \leq_{\mathbf L})$ satisfying $x \leq_{\mathbf L} y$ or
$y \leq_{\mathbf L}x$ for every pair of elements $x, y \in E$. A linear
order $\mathbf{L}$ is a linear extension of a poset $\textbf{P}$ if
$x \leq_{\mathbf L} y;$ whenever $x \leq y$ em $\mathbf P$.
Is is a well-known fact that every poset admits at least one linear
extension. Consider a linear extension $\mathbf{L}$ of poset
$\mathbf{P}$. We will write
$${\mathbf L}={ x_{1} <{\mathbf L} x{2} <{\mathbf L} \ldots <{\mathbf L} x_{n}},$$
where $x_i \in E ;(i=1,\ldots,n)$ and $n=|E|$. Since it is possible for
a poset to admit more than one linear extension, it is natural to
consider an optimization problem that asks for the best linear extension
of a poset $\mathbf P$, according to some criterion.
The Minimum Cost Linear Extension Problem
Let $c: E \times E \rightarrow \mathbb{R}_+$ be a cost function that
associates a non-negative real value with every pair
$x,y \in \mathbf{P}$, while satisfying the following conditions:
By slightly abusing notation, we will write the total cost of a linear
extension
${\mathbf L}={ x_{1} <{\mathbf L} x{2} <{\mathbf L} \ldots <{\mathbf L} x_{n}}$
as
Given a poset $\mathbf P$ and a cost function $c$ as described above,
the Minimum Cost Linear Extension problem (thereby referred to as MCLE)
asks for a linear extension $L$ of $\mathbf P$ which minimizes
$(\ref{custo})$. In other words, we are interested in finding
$\mathbf{L}^$ satisfying
$$\mathbf{L}^ = \operatorname*{arg,min}{c(\mathbf{L}) : \mathbf{L} \mbox{ is a linear extension of } {\mathbf P} }.$$
The MCLE was originally defined in @Liu and further studied in @Wu. In
@Liu the authors showed that the MCLE generalizes the so-called bump
number and jump number problems on posets. Specifically, if we define
$c(x,y)=1$, for $x \sim y$, the problem becomes equivalent to the jump
number problem. Since the jump number problem is known to be NP-hard
(see @Bouchitte), the MCLE is also NP-hard. In this work, however, we
concentrate on the general version of the problem, in which $\mathbf P$
is an arbitrary poset and $c$ is an arbitrary non-negative real-valued
cost function satisfying conditions ([func]).
While discussing the algorithms described in this work, we shall assume
that the computational representation of a poset $\mathbf P=(E,\leq)$ is
that of a directed graph, in which vertices correspond to elements of
$E$ and arcs are associated with a superset of the coverage relations in
$\mathbf P$. Formally, this representation is a directed graph
$G=(V,A)$, with $V=E$ and $A={(x,y) : x, y \in E, x < y}$. We do not
insist that only arcs corresponding to coverage relations are included
in this representation, since this does not have an impact on the
worst-case time complexity of the algorithms discussed here. We shall
make use of the notation $\delta^+(v)$ to denote the outward
neighborhood of vertex $v$, i.e., the set of vertices
directly reachable from $v$. Similarly, $\delta^-(v)$ will be utilized
to denote the inward neighborhood of vertex $v$: the set of all vertices
$w$ for which there exists an arc $(w,v) \in A$. We shall refer to
$|\delta^+(v)|$ as the out-degree of vertex $v$, while $|\delta^-(v)|$
shall be referred to as the in-degree of $v$.
Algorithm 2 of Liu et al. {#Algo2}
In what folows we describe the greedy algorithm proposed by Liu
et al. in @Liu, referred to as “Algorithm 2” in their
paper. Here we provide a somewhat different description than the one
given in the original paper. The reason for that is that we believe our
description focus more closely on the conceptual steps, rather than on
implementation choices.
For the sake of simplifying the presentation of algorithm
Liu2, we shall define the operation of
injecting an element into a partial linear
extension of a poset $\mathbf{P}$. We define a partial linear
extension of a poset $\mathbf{P}=(E,\leq)$ as a linear order
$\mathbf{L}=(X,\leq_{\mathbf L})$ involving a subset $X$ of $E$ and
satisfying $x \leq_{\mathbf L} y$, whenever $x \leq y$ in $\mathbf P$.
Let $\mathbf{L}$ be a partial linear order of $\mathbf{P}$ and
$x \in E \setminus \mathbf{L}$. The operation of injecting $x$ into
$\mathbf{L}$ corresponds to augmenting the linear order $\mathbf{L}$ by
adding $x$ to it, while satisfying the following conditions:
The resulting linear order $\mathbf{L'}$ is consistent with
${\mathbf P}$ (i.e., $x \leq_{\mathbf{L'}} y$
whenever $x \leq y$ in $\mathbf{P}$);
$c(\mathbf{L'})$ is minimum.
In other words, we want to add $x$ into $\mathbf{L}$ while preserving
the relative order of the elements in $\mathbf{L}$ and minimizing the
resulting total cost of $\mathbf{L'}$. Let
$(y_1, \ldots, y_{|{\mathbf L}|})$ be the elements of $\mathbf{L}$,
where $y_i <{\mathbf{L}} y{i+1}, ;i=1,\ldots,|\mathbf{L}|-1$. Then,
the choices for positioning $x$ when injecting it into $\mathbf{L}$ are
the following:
Obviously, some of these positions might be invalid, as they might not
be consistent with the order relations in $\mathbf{P}$. We present in
Figure [Liu2Pseudocode] the pseudocode for algorithm Liu2.
rl
**Input: & Cover graph $G$ of a poset
${\mathbf P}=(E,\leq)$ of height $T$.
**Output: & Linear extension $\mathbf L$ of $\mathbf P$.
& ****
**1. & Let $C$ be a maximum-size chain of $\textbf{P}$
**2. & Let ${\mathbf L} := C$, $R := E \setminus C$
**3. & while$R \neq \emptyset$do
**4. & ********
It is interesting to note that the Liu2 algorithm was
presented in @Liu as a “greedy approximation algorithm.” We believe that
the authors meant by it that the algorithm is a greedy procedure that
produces solutions whose objective function values are typically (but
not provably) close to the optimal value, what is a departure from the
standard concept of an approximation algorithm. Indeed, as we argue
below, it is possible to devise examples for which the solution produced
by the algorithm is arbitrarily large (in terms of objective function
value), as compared to the actual minimum.
The Liu2 algorithm does not have a fixed approximation
rate.
Consider the poset $\mathbf{P}$ consisting of a 4-element ground set
$E={a,b,c,d}$ and the single relation $b<a$. Let the costs between the
incomparable elements in $\mathbf P$ be as follows:
where $M$ is a positive constant strictly greater than $1$.
p4.7cmp4.7cmp4cm &
&
(a) Solution produced by algorithm Liu2; cost = $1+M$. &
(b) Solution produced by algorithm Liu2; cost = $1+M$. &
(c) Optimal solution; cost = $3$.
[figLiu2]
The application of algorithm Liu2 to $\mathbf{P}$ yields a
linear extension with total cost $1+M$. Figure figLiu2 and
figLiu2 show two such linear extensions. Yet another alternative
solution with the same cost would be
$a \rightarrow b \rightarrow d \rightarrow c$. Which solution is to be
reported by the algorithm depends only on implementation choices
concerning how to break ties. The optimal solution, however, has an
objective function value of 3, as shown in Figure figLiu2.
Therefore, by appropriately changing the value of $M$, one can make the
approximation ratio between the objective function value of the solution
produced by algorithm Liu2 and the optimal value
arbitrarily large. <1.5em -1.5em plus0em minus0.5em height0.75em
width0.5em depth0.25em
Greedy Topological Sorting
A topological sort of a directed acyclic graph $G=(V,A)$ is a linear
ordering of the elements of $V$ such that $A$ contains no arc $(u,v)$
with $u$ appearing before $v$ in the ordering (i.e., the ordering is
consistent with the arcs of $G$). Let us define $n=|V|$ and $m=|A|$. A
topological sort can also be seen as a mapping
$\sigma: V \rightarrow {1,\dots,n}$ such that
$(u,v) \in A \Rightarrow \sigma(u)<\sigma(v)$. Since the cover graph of
a poset $\mathbf{P}$ is a directed acyclic graph, there is a one-to-one
correspondence between the set of topological sorts of the cover graph
of $\mathbf{P}$ and the set of linear extensions of $\mathbf{P}$.
It is, therefore, natural to consider an algorithm for the MCLE which is
based on a topological sorting algorithm. One of the standard algorithms
for topologically sorting the vertices of a directed acyclic graph is
essentially a graph traversal procedure and can be implemented to run in
$\Theta(n+m)$ time. The algorithm can de described as the process of
augmenting a partial linear extension (i.e., a topological sort
involving a proper subset of the elements of $\mathbf{P}$) by including
into it a vertex with out-degree equal to zero (ties can be broken
arbitrarily). The algorithm starts with the empty partial linear
extension (PLE). Every time a vertex $v$ is added to the PLE the
vertices in $\delta^-(v)$ have their out-degree values decreased by 1.
Note that, by construction, the vertices in $\delta^-(v)$ have not yet
been added to the PLE. Moreover, the operation of updating their
out-degrees amounts to removing the set of arcs
${(u,v) : u \in \delta^-(v)}$ from $A$.
Since the MCLE accounts for possibly different costs associated with
pairs of incomparable adjacent elements in the ordering, the standard
algorithm for topological sorting must be modified in order to reflect
that cost structure. A straightforward variant of the algorithm can be
obtained by applying a greedy criterion to break ties whenever there are
two or more elements with zero out-degree. At the first iteration, the
algorithm arbitrarily selects a vertex with zero out-degree. From the
second iteration and on, let $L$ be the current PLE and $w$ be the
vertex which was most recently added to $L$. The algorithm selects a
vertex $v^$ with the minimum cost $c(w,v^)$ to be added to $L$:
$$\displaystyle v^* = \operatorname*{arg,min}_{\substack{v ; \notin ; L \ ; |\delta^+(v)|=0}} c(v,w).$$
We are now ready to present the pseudocode for the greedy topological
sorting algorithm (see Figure [GTSAlgo]).
rl
**Input: & Cover graph $G$ of a poset ${\mathbf P}$, cost
function $c:V \times V \rightarrow \mathbb{R}$.
**Output: & Linear extension $L$ of $\mathbf P$.
& ****
**1. & $L := \emptyset, ;;S := {v \in V : |\delta^+(v)|=0}$, iter
$:=1$
**2. & while$(|S|>0)$do
**3. & ******
$(\mbox{iter} = 1)$then
**4. & **
Arbitrarily select $v^* \in S$
$\triangleright$ First vertex in the ordering
**5. & **
$\triangleright$ Vertex of minimum cost w.r.t last element
**7. & **
$L := \left[L, v^*\right]$
**8. & **
Update out-degrees of vertices in $\delta^-(v^*)$, Update $S$
**9. & **
$w:= v^*$, iter := iter + 1
**10. & end
**
[GTSAlgo]
The topological sorting heuristic GreedyTopSort runs in
${\cal O}(n^2)$ time.
Note that the set $S$ must be updated at the end of each iteration,
since the updating of the out-degrees of vertices in $\delta^-(v^*)$
might decrease the out-degree of other vertices to zero. Furthermore,
the main loop is guaranteed to run exactly $n$ times. Step 8 requires a
total of ${\cal O}(m)$ time over all iterations, since it inspects each
arc exactly once. Step 6 requires ${\cal O}(n)$ time per iteration,
assuming that the value of $c(u,v)$ can be computed, or looked-up, in
constant time. All other internal steps in the loop demand constant
time. Therefore, the algorithm demands a total time of
${\cal O}(n^2+m)={\cal O}(n^2)$. <1.5em -1.5em plus0em minus0.5em
height0.75em width0.5em depth0.25em
More sophisticated greedy schemes can definitely be devised. In
particular, criteria that incorporate look-ahead features might prove to
be a better compromise between running time and solution quality. We
chose to test this simpler version due to its relatively modest time
complexity and as a way of initially assessing the quality of a
topological sorting-based heuristic, as compared to the greedy algorithm
of @Liu.
An Exact Model Based on Constraint Programming
The basic components of a constraint programming model are a set of
decision variables, their specific domains, and a set of constraints on
those variables. A search engine is responsible for systematically
assigning values to decision variables and performing a process of
propagation, by means of which the data from the model’s
constraints is used to further restrict the domains of unassigned
variables. These steps are repeated up to the point when the domain of
every variable has been restricted to a single value. At that moment, a
feasible solution has been produced. Below we decribe a constraint
programming model for the MCLE problem, along with the strategy applied
by the constraint solver we utilized, in order to obtain an optimal
solution – rather than enumerating all possible feasible assignments.
We modeled the MCLE problem by defining decision variables associated to
the vertices of the directed graph $G=(V,A)$ described in Section
[Intro]. Each such decision variable corresponds to the position of its
associated vertex in the linear extension. We shall refer to each such
variable as pos[$i$], $i=1,\ldots,n$. The domain of
pos[$i$] is, therefore, ${1,\ldots,n}$. Thus, a natural
constraint on those variable is
$$\mbox{\tt distinct( pos[1], pos[2], ..., pos[n] )},$$ which ensures
that each vertex occupies a distinct position in the resulting linear
extension.
The arcs in graph $G$ induce constraints in the model, as well. For each
arc $(x,y) \in A$, a constraint of the “relational” type is added to
ensure that the position of vertex $x$ will be less than that of vertex
$y$: pos[$x$] < pos[$y$].
In order to model the objective function we need to account for the cost
associated with each pair of incomparable elements which happen to be
adjacent in the linear extension, i.e., the cost
$c(x,y), ; x \sim y \mbox{ in } \mathbf P$, with
pos[$x$] = pos[$y$]-1. This is
accomplished with the use of four auxiliary decision variables for each
pair of incomparable elements $x \sim y$: D${x,y}$,
e${x,y}$, D${y,x}$ and
e${y,x}$. The following constraints make sure that
e${x,y}$ is equal to $1$, if element $x$ is immediately
followed by element $y$ in the linear extension, and
e${x,y}$ is equal to $0$, otherwise:
D$_{x,y}$ = pos[$y$] - pos[$x$],\
, e$_{x,y} \in {0,1}$,
where count($A$, value) simply counts the number of times
when the decision variables in set $A$ are equal to value.
Therefore, the expression count(${$D${x,y}}$, 1) is
equal to 1 or 0, depending on whether D${x,y}$ is equal 1
or not, respectively. Similarly, we must add the corresponding
constraints for D${y,x}$ and e${y,x}$.
Now, defining the objective function is a matter of adding up the costs
for those incomparable pairs $x,y, \in \mathbf P$ for which
e${x,y}=1$ or e${y,x}=1$:
$${\tt ObjFcn := } ; \sum_{\substack{x,y :\in: E: \ x \sim y }} \left[c(x,y) : {\tt e}{x,y} + c(y,x) : {\tt e}{y,x} \right];.$$
We implemented the model in C++ with the use of Gecode @Gecode, a
state-of-the-art constraint solver, which, in addition to the standard
constraint satisfaction search engine, is equipped with a
branch-and-bound module, allowing for an automatic search for optimal
solutions satisfying the underlying model.
An optimal solution to the problem is obtained with the definition of
the objective function ObjFcn and a special method, called
constrain. Whenever an incumbent solution
$\widehat{x}=(\widehat{\tt pos},\widehat{\tt D},\widehat{\tt e})$ is
found, the constrain method is invoked and a new constraint
is added to the model requiring that ObjFcn$< v(\widehat{x})$, where $v(\widehat x)$ is the objective function
value of solution $\hat{x}$. Thus, as the search resumes, the
branch-and-bound search engine will restrict the decision variables
domains accordingly, ensuring that any subsequently found solution will
have an objective function value strictly less than that of the current
incumbent solution. At termination, the last incumbent solution
$x^=({\tt pos}^,{\tt D}^,{\tt e}^)$ will naturally be an optimal
solution to the problem, with the vector pos$^*$ specifying
the absolute position of each element in the resulting linear extension.
Computational Results
We evaluated the two heuristic algorithms GreedyTopSort and
Liu2 on a number of randomly generated instances of the
MCLE problem. For a small subset of the instances, we were also able to
run the exact constraint-based model (instances with more than 20
vertices demanded an excessive running time and were, therefore,
excluded from the experiments). The results of the exact model are
particularly important, as they provide information on how close to
being optimal the heuristic solutions are (with respect to objective
function value).
Instances were created with a special-purpose generator implemented in
C++. The generator makes use of the standard procedure for generating
random graphs: given a positive integer $n$ (specifying the number of
vertices of the graph) and a real value $p \in (0,1]$, we create the arc
from vertex $x$ to vertex $y$ if a randomly generated value in $(0,1]$
is greater than $p$ (e.g., see @Bianco). Since we are
interested in generating acyclic graphs, we only consider the creation
of arc $(x,y)$ if $x<y$. Each graph created in this manner is considered
as the representation of a poset, possibly including some arcs that
correspond to transitive (and, therefore, redundant) relations. Finally,
for each pair $x,y \in \mathbf P$ of incomparable elements, we randomly
select two integers in the range $\left[0,n^2\right]$ as the costs
$c(x,y)$ and $c(y,x)$.
Table [TableSmall] presents the solution values and running times of the
two heuristics on a set of “small” instances, consisting of posets with
a number $n$ of elements ranging from 10 to 80. The parameter $p$ takes
on four different values: 0.2, 0.4, 0.6 and 0.8. For each combination of
$n$ and $p$, we generated three different posets, by varying the random
seed used to initialize the random generator. Table [TableSmall] also
includes results for the exact model on instances with up to 20
vertices.
The heuristics are referred to in the table as GTS (short
for GreedyTopSort), Liu2 and CP
(short for “constraint programming model”). Columns 3-5 correspond to
the average running times of each algorithm on the three instances with
the corresponding values of $n$ and $p$. Columns 6-7 inform the average
value of the solutions produced by GTS and
Liu2, as percentages of the optimum value found by
CP (as mentioned before, that information is only available
for $n \leq 20$, as instances with larger values of $n$ demanded a
prohibitive amount of time to solve to optimality). Column 8 provides a
direct comparison between the objective function values of the solutions
produced by GTS as percentages of the values of the
solutions produced by Liu2. As before, each value in that
column is the average over three instances with the corresponding values
of $n$ and $p$.
Table [TableLarge] presents similar data concerning solution values and
running times of the two heuristics on “large” instances – those with a
number of vertices varying from 100 to 1,000. Columns 3-4 are average
running times over three instances, while column 5 has the same meaning
as column 8 of Table [TableSmall].
It can seen from Table [TableSmall] that GTS outperforms
Liu2 substantially when it comes to running time. On the
other hand, Liu2 performs better, with respect to objective
function value, than GTS in most cases. In fact, this is
true for instances with $p \leq 60$. For instances with $p=0.80$, the
tendency reverts: heuristic GTS produces solutions that
are, in average, slightly less costly than those produced by
Liu2.
In Table [TableLarge], the running time of GTS remains very
low. Indeed, it is sometimes several orders of magnitude lower than that
of Liu2. Here it is important to highlight the fact that
the algorithms’ implementations are homogenized, in the sense that the
two programs share as much code and data structures as possible.
Regarding the objective function of the heuristic solutions, we can see
a similar trend to the one present in the Table [TableSmall]: algorithm
Liu2 obtains better objective function values for instances
with $p \leq 0.6$, at the expense of higher running times. For $p=0.8$,
GTS outperforms Liu2, producing solutions
whose objective function values are 89.5% of those of the solutions
produced by Liu2, in average. This behavior seems to be
related to the scenario described in Section [Algo2], where algorithm
Liu2 can produce solutions that are far from being optimal.
Indeed, the lower the density of the graphs (resulting from high values
of $p$), the more likely it is that the scenario described in Section
[Algo2] will happen.
Concluding Remarks
In this paper, we presented two novel solution approaches for the
so-called Minimum Cost Linear Extension problem (MLCE): a topological
sorting-based heuristic and a constraint programming model. We revisited
an existing greedy algorithm for the MCLE, due to Liu et
al. @Liu, and showed that their algorithm is not an
approximation one, as originally suggested in their paper.
We also carried out a number of computational experiments on randonly
generated posets and reported on the relative performance of the three
procedures, which shows a clear trend as we vary the density of the
posets. The constraint programming approach was unable to compute
provably optimal solutions for posets with more than 20 elements
($n>20$), but it has proven instrumental in determining empirically that
the algorithm of Liu et al. obtains near-optimal
solutions for small-sized instances. In addition to that, the Liu
et al. algorithm clearly outperforms the topological
sorting heuristic for posets with high density (values of the parameter
$p \leq 0.4$). For $p=0.6$, the scenario is no longer entirely clear,
but their algorithm was still slightly superior. For $p=0.8$ (lower
densities), the topological sorting heuristic provided better solutions
than Liu et al.’s algorithm. A remarkable feature of the
topological sorting heuristic is its consistently low running time,
which was, at times, several orders of magnitude smaller than that of
the algorithm of Liu et al.
99
L. Bianco, P. Dell’Olmo, S. Giordani, An optimal algorithm to
find the jump number of partially ordered sets, Computational
Optimization and Applications, 8(2):197-210, 1997.
V. Bouchitte, M. Habib, NP-Completeness properties about linear
extensions, Order 4:13-154, 1987.
Gecode Team, Gecode: Generic Constraint Development
Environment, Available from http://www.gecode.org, 2006.
L. Liu, B. Wu, E. Yao, Minimizing the sum cost in linear
extensions of a poset, J. Comb. Optim. 21:247-253, 2009.
B. Wu, E. Yao, L. Liu, A polynomially solvable case of optimal
linear extension problem of a poset, J. Comb. Optim.
20:422-428, 2010.