Skip to content

Chapter 7: Graphs: Only Connect!

George Heineman edited this page Jul 27, 2021 · 1 revision

The following are the code listings for Chapter 7. [book.py]


Listing 7-1. A program that builds the graph in Figure 7-2

import networkx as nx
G = nx.Graph()                                             ❶
G.add_node('A2')                                           ❷
G.add_nodes_from(['A3', 'A4', 'A5'])                       ❸

G.add_edge('A2', 'A3')                                     ❹
G.add_edges_from([('A3', 'A4'), ('A4', 'A5')])             ❺

for i in range(2, 6):
  G.add_edge('B{}'.format(i), 'C{}'.format(i))             ❻
  if 2 < i < 5:
    G.add_edge('B{}'.format(i), 'B{}'.format(i+1))
  if i < 5:
    G.add_edge('C{}'.format(i), 'C{}'.format(i+1))

>>> print(G.number_of_nodes(), 'nodes.')                   ❼
>>> print(G.number_of_edges(), 'edges.')
>>> print('adjacent nodes to C3:', list(G['C3']))          ❽
>>> print('edges adjacent to C3:', list(G.edges('C3')))    ❾
12 nodes.
12 edges.
adjacent nodes to C3: ['C2', 'B3', 'C4']
edges adjacent to C3: [('C3', 'C2'), ('C3', 'B3'), ('C3', 'C4')]

nx.Graph() constructs a new undirected graph.
❷ A node can be any hashable Python object except None. Strings are a good choice.
❸ Add multiple nodes from a list using add_nodes_from().
❹ Add an edge between two nodes, u and v, with add_edge(u, v).
❺ Add multiple edges from a list using add_edges_from().
❻ If an edge is added to a graph before its nodes are, the corresponding nodes are automatically added to the graph.
❼ A graph can report its number of nodes and edges.
❽ Find the nodes adjacent to v by using the G[v] lookup capability.
❾ Find the edges adjacent to v by using the G.edges(v) function.

Listing 7-2. Linked list implementation of Stack data type

class Stack:
  def __init__(self):
    self.top = Nonedef is_empty(self):
    return self.top is Nonedef push(self, val):
    self.top = Node(val, self.top)         ❸

  def pop(self):
    if self.is_empty():                    ❹
      raise RuntimeError('Stack is empty')

    val = self.top.valueself.top = self.top.nextreturn val

❶ Initially, top is None, reflecting an empty Stack.
❷ A Stack is empty if top is None.
❸ Ensures new Node is the first one in linked list, with existing linked list becoming the rest.
❹ An empty Stack causes a RuntimeError.
❺ Extract the newest value from top of stack to be returned.
❻ Reset Stack so the next Node is now on top (if None, then Stack becomes empty).

Listing 7-3. Depth First Search of graph from designated source node, src

def dfs_search(G, src):        ❶
  marked = {}                  ❷
  node_from = {}               ❸

  stack = Stack()
  marked[src] = Truestack.push(src)

  while not stack.is_empty():  ❺
    v = stack.pop()
    for w in G[v]:
      if not w in marked:
        node_from[w] = vmarked[w] = Truestack.push(w)

  return node_from

❶ Conduct a Depth First Search over graph, G, starting from source node, src.
❷ The marked dictionary records nodes that have already been visited.
❸ Record how search got to each node: node_from[w] is the prior node working backward to src.
❹ Mark and place src node into Stack to start the search. The top node in the Stack represents the next node to explore.
❺ If the Depth First Search has not yet completed, v is the next node to explore.
❻ For each unmarked node, w, adjacent to v, remember that to get to w, the search came from v.
❼ Push w onto the top of the stack and mark it so it won't be visited again.
❽ Return the structure of the search that records for each node, v, the prior node from a search initiated at src.

Listing 7-4. Recovering actual path from node_from[]

def path_to(node_from, src, target):    ❶
  if not target in node_from:
    raise ValueError('Unreachable')     ❼

  path = []
  v = targetwhile v != src:
    path.append(v)                      ❸
    v = node_from[v]                    ❹

  path.append(src)                      ❺
  path.reverse()                        ❻
  return path

node_from structure is needed to recover path from src to any target.
❷ To recover the path, set v to target node.
❸ As long as v is not src, append v to path, a backward list of nodes found on path from src to target.
❹ Continue backward by setting v to the prior node recorded by node_from[v].
❺ Once src is encountered, the while loop terminates, so src must be appended to complete the backward path.
❻ Return the reverse of path to produce the proper ordering from src to target.
❼ If node_from[] doesn't contain target, then it is not reachable from src.

Listing 7-5. Breadth First Search of graph from designated source node

def bfs_search(G, src):        ❶
  marked = {}                  ❷
  node_from = {}               ❸

  q = Queue()
  marked[src] = Trueq.enqueue(src)

  while not q.is_empty():      ❺
    v = q.dequeue()
    for w in G[v]:
      if not w in marked:
        node_from[w] = vmarked[w] = Trueq.enqueue(w)

  return node_from

❶ Conduct a Breadth First Search over graph, G, starting from source node, src.
❷ The marked dictionary records nodes that have already been visited.
❸ Record how search got to each node: node_from[w] is the prior node working backward to src.
❹ Mark and place src node into Queue to start the search. The first node in the Queue represents the next node to explore.
❺ If the Breadth First Search has not yet completed, v is the next node to explore.
❻ For each unmarked node, w, adjacent to v, remember that to get to w, the search came from v.
❼ Place w as last node to explore at the end of the queue, and mark it so it isn't visited multiple times.
❽ Return the structure of the search that records for each node, v, the prior node from a search initiated at src.

Listing 7-6. A Guided Search using Manhattan distance to control search

def guided_search(G, src, target):                ❶
  from ch04.heap import PQ
  marked = {}                                     ❷
  node_from = {}                                  ❸

  pq = PQ(G.number_of_nodes())                    ❹
  marked[src] = True
  pq.enqueue(src, -distance_to(src, target))      ❺

  while not pq.is_empty():                        ❻
    v = pq.dequeue()

    for w in G.neighbors(v):
      if not w in marked:
        node_from[w] = vmarked[w] = True
        pq.enqueue(w, -distance_to(w, target))    ❽

  return node_fromdef distance_to(from_cell, to_cell):
  return abs(from_cell[0] - to_cell[0]) + abs(from_cell[1] - to_cell[1])

❶ Conduct a Guided Search over graph, G, starting from source node, src, knowing the target node to locate.
❷ The marked dictionary records nodes that have already been visited.
❸ Record how search got to each node: node_from[w] is the prior node working backward to src.
❹ Using the heap-based priority queue, you must pre-allocate sufficient space to include, potentially, all nodes in the graph.
❺ Mark and place src node into max priority queue to start the search using as its priority the negative of its distance to target.
❻ If the Guided Search has not yet completed, the node closest to target is the next node to explore.
❼ For each unmarked node, w, adjacent to v, remember that to get to w, the search came from v.
❽ Place w into its appropriate location in the priority queue, using the negative of the Manhattan distance as priority, and mark it so it isn't visited multiple times.
❾ Return the structure of the search that records for each node, v, the prior node from a search initiated at src.

Listing 7-7. Code fragment showing performance based on the number of edges

while not stack.is_empty():
  v = stack.pop()
  for w in G[v]:
    if not w in marked:
      marked[w] = True
      stack.push(w)
       ...

Listing 7-8. Recursive implementation of Depth First Search on a directed graph

def dfs_search(G, src):      ❶
  marked = {}                ❷
  node_from = {}             ❸

  def dfs(v):                ❹
    marked[v] = Truefor w in G[v]:
      if not w in marked:
        node_from[w] = vdfs(w)               ❼

  dfs(src)                   ❽
  return node_from

❶ Conduct a Depth First Search over graph G starting from source node, src.
❷ The marked dictionary records nodes that have already been visited.
❸ Record how dfs() found each node: node_from[w] is the prior node working backward to src.
❹ Recursive method to continue search from an unmarked node, v.
❺ Be sure to mark that v has been visited.
❻ For each unmarked node, w, adjacent to v, remember that to get to w, the search came from v.
❼ In the recursive case, continue search in direction of unmarked node, w. When recursive call ends, continue with for loop over w.
❽ Invoke the initial recursive call on source node, src.
❾ Return the structure of the search that records for each node, v, the prior node from a search initiated at src.

Listing 7-9. Detecting cycles in a directed graph using Depth First Search

def has_cycle(DG):
  marked = {}
  in_stack = {}

  def dfs(v):                                ❶
    in_stack[v] = Truemarked[v] = Truefor w in DG[v]:
      if not w in marked:
        if dfs(w):                           ❹
          return True
      else:
        if w in in_stack and in_stack[w]:    ❺
          return True

    in_stack[v] = Falsereturn False

  for v in DG.nodes():                       ❼
    if not v in marked:
      if dfs(v):                             ❽
        return True
  return False

❶ Conduct a Depth First Search over graph, DG, starting from v.
in_stack records nodes that are in the recursive call stack. Mark that v is now part of a recursive call.
❸ The marked dictionary records nodes that have already been visited.
❹ For each unmarked node, w, adjacent to v, initiate recursive dfs() on w, and if True is returned, a cycle is detected, so it returns True as well.
❺ If a node, w, is marked as visited, it could still be in our call stack--if it is, then a cycle has been detected.
❻ Equally important, when the dfs() recursive call ends, set in_stack[v] to False since v is no longer on the call stack.
❼ Investigate each unmarked node in the directed graph.
❽ If invoking dfs(v) on a node, v, detects a cycle, return True immediately.

Listing 7-10. Topological sort over the directed graph

def topological_sort(DG):
  marked = {}
  postorder = []                  ❶

  def dfs(v):                     ❷
    marked[v] = Truefor w in DG[v]:
      if not w in marked:
        dfs(w)                    ❹
    postorder.append(v)           ❺

  for v in DG.nodes():
    if not v in marked:           ❻
      dfs(v)

  return reversed(postorder)      ❼

❶ Use a list to store (in reverse order) a linear ordering of nodes to be processed.
❷ Conduct a Depth First Search over DG starting from v.
❸ The marked dictionary records nodes that have already been visited.
❹ For each unmarked node, w, adjacent to v, recursively explore dfs(w).
❺ When dfs(v) gets to this key step, it has fully explored all nodes that (recursively) depend on v, so append v to postorder.
❻ Ensure that all unmarked nodes are visited. Note that each time dfs(v) is invoked, a different subset of graph DG is explored.
❼ Because the list holds the linear ordering in reverse order, return its reverse.

Listing 7-11. Structure of an indexed min priority queue

class IndexedMinPQ:
  def less(self, i, j):                                            ❶
    return self.priorities[i] > self.priorities[j]

  def swap(self, i, j):
    self.values[i],self.values[j] = self.values[j],self.values[i]  ❷
    self.priorities[i],self.priorities[j] = self.priorities[j],self.priorities[i]

    self.location[self.values[i]] = iself.location[self.values[j]] = j

  def __init__(self, size):
    self.N          = 0
    self.size       = size
    self.values     = [None] * (size+1)                            ❹
    self.priorities = [None] * (size+1)

    self.location   = {}                                           ❺

  def __contains__(self, v):                                       ❻
    return v in self.location

  def enqueue(self, v, p):
    self.N += 1

    self.values[self.N], self.priorities[self.N] = v, pself.location[v] = self.Nself.swim(self.N)

❶ Because this is a min priority queue, item i is of lower priority than item j if its priority is a larger numeric value.
swap() switches the values and priorities for items i and j.
swap() updates the respective locations for items i and j.
values stores the value of the nth item; priorities stores the priorities of the nth item.
location is a dictionary that returns the index position into values and priorities for each value that is enqueued.
❻ Unlike a traditional priority queue, an indexed min priority queue can inspect location to determine in amortized O(1) time whether a value is being stored in the priority queue.
❼ To enqueue a (v, p) entry, place v in values[N] and p in priorities[N], which is next available bucket.
enqueue() must also associate this new index location with v before invoking swim() to guarantee the heap-ordered property.

Listing 7-12. Decreasing priority for a value in IndexedMinPQ

def decrease_priority(self, v, lower_priority):
  idx = self.location[v]                         ❶
  if lower_priority >= self.priorities[idx]:     ❷
     raise RuntimeError('...')

  self.priorities[idx] = lower_priorityself.swim(idx)                                 ❹

❶ Find the location, idx, in the heap where v is found.
❷ If lower_priority is actually not lower than the existing priority in priorities[idx], raise a RuntimeError.
❸ Change the priority for value v to be the lower priority.
❹ Reestablish heap-ordered property if necessary by swimming this value up.

Listing 7-13. Removing highest-priority value in IndexedMinPQ

def dequeue(self):
  min_value = self.values[1]                             ❶

  self.values[1] = self.values[self.N]                   ❷
  self.priorities[1] = self.priorities[self.N]
  self.location[self.values[1]] = 1

  self.values[self.N] = self.priorities[self.N] = Noneself.location.pop(min_value)                           ❹

  self.N -= 1self.sink(1)
  return min_value

❶ Remember min_value, the value with highest priority.
❷ Move the item at location N to the top-level location 1 and ensure that location records the new index position for this value.
❸ Remove all trace of the former min_value being removed.
❹ Remove min_value entry from location dictionary.
❺ Reduce number of entries before invoking sink(1) to reestablish heap-ordered property.
❻ Return the value associated with entry of highest priority (which is the one with smallest magnitude).

Listing 7-14. Dijkstra’s algorithm to solve single-source shortest path problem

def dijkstra_sp(G, src):
  N = G.number_of_nodes()

  inf = float('inf')                         ❶
  dist_to = {v:inf for v in G.nodes()}
  dist_to[src] = 0

  impq = IndexedMinPQ(N)                     ❷
  impq.enqueue(src, dist_to[src])
  for v in G.nodes():
    if v != src:
      impq.enqueue(v, inf)

  def relax(e):
    n, v, weight = e[0], e[1], e[2][WEIGHT]  ❺
    if dist_to[n] + weight < dist_to[v]:     ❻
      dist_to[v] = dist_to[n] + weightedge_to[v] = eimpq.decrease_priority(v, dist_to[v])  ❾

  edge_to = {}                               ❸
  while not impq.is_empty():
    n = impq.dequeue()                       ❹
    for e in G.edges(n, data=True):
      relax(e)

  return (dist_to, edge_to)

❶ Initialize dist_to dictionary to infinity for all nodes except src, which is 0.
❷ Enqueue all N nodes into impq to prepare for while loop.
edge_to[v] records the edge terminating at v found during the search.
❹ Find node, n, that has shortest computed path from src. Explore its edges (n, v, weight) to see if a new shortest path to v has been found. networkx requires data = True to retrieve edge weights.
❺ Extract n, v, and weight from the edge (n, v).
❻ If distance to n added to edge weight to v is smaller than best path so far to v, then a shorter path has been found.
❼ Update the shortest known distance to v.
❽ Record the edge (n, v) that brought Dijkstra's algorithm to v along the new shortest path.
❾ Most importantly, reduce priority in impq to new shortest distance so while loop will be able to retrieve node with shortest computed path.

Listing 7-15. Recovering actual path from edge_to[]

def edges_path_to(edge_to, src, target): ❶
  if not target in edge_to:
    raise ValueError('Unreachable')      ❼

  path = []
  v = targetwhile v != src:
    path.append(v)                       ❸
    v = edge_to[v][0]                    ❹

  path.append(src)                       ❺
  path.reverse()                         ❻
  return path

edge_to[] structure is needed to recover path from src to any target.
❷ To recover the full path, start at target.
❸ As long as v is not src, append v to path, a backward list of nodes found on path from src to target.
❹ Set v to become u, the prior node in the edge (u, v) from edge_to[v].
❺ Once src is encountered, the while loop terminates, so src must be appended to complete the backward path.
❻ Reverse the list so all nodes appear in proper order from src to target.
❼ If target is not in edge_to[], it is not reachable from src.

Listing 7-16. Bellman–Ford algorithm implementation

def bellman_ford(G, src):
  inf = float('inf')
  dist_to = {v:inf for v in G.nodes()}               ❶
  dist_to[src] = 0
  edge_to = {}                                       ❷

  def relax(e):
    u, v, weight = e[0], e[1], e[2][WEIGHT]
    if dist_to[u] + weight < dist_to[v]:             ❺
      dist_to[v] = dist_to[u] + weightedge_to[v] = ereturn Truereturn False

  for i in range(G.number_of_nodes()):               ❸
    for e in G.edges(data=True):                     ❹
      if relax(e):
        if i == G.number_of_nodes()–1:               ❾
          raise RuntimeError('Negative Cycle exists in graph.')

  return (dist_to, edge_to)

❶ Initialize dist_to dictionary to infinity for all nodes except src, which is 0.
edge_to[v] records the edge terminating at v found during the search.
❸ Make N passes over the graph.
❹ For every edge e = (u,v) in the graph, use the same relax() concept as Dijkstra's algorithm; see if e improves on an existing shortest path from src to v by going through u.
❺ If distance to u added to edge weight to v is smaller than best path so far to v, then a shorter path has been found.
❻ Update the shortest known distance to v.
❼ Record the edge (u, v) that brought algorithm to v along the new shortest path.
❽ If relax() returns True, then a new shortest path was found to v.
❾ Bellman–Ford makes N passes over all E edges. If in the final pass, an edge, e, is found that still reduces the shortest distance from src to some v, there must be a negative cycle in the graph.

Listing 7-17. Floyd–Warshall Algorithm

def floyd_warshall(G):
  inf = float('inf')
  dist_to   = {}                                  ❶
  node_from = {}
  for u in G.nodes():
    dist_to[u]   = {v:inf for v in G.nodes()}     ❷
    node_from[u] = {v:None for v in G.nodes()}    ❸

    dist_to[u][u] = 0for e in G.edges(u, data=True):               ❺
      v = e[1]
      dist_to[u][v] = e[2][WEIGHT]
      node_from[u][v] = ufor k in G.nodes():
    for u in G.nodes():
      for v in G.nodes():
        new_len = dist_to[u][k] + dist_to[k][v]   ❼
        if new_len < dist_to[u][v]:
          dist_to[u][v] = new_lennode_from[u][v] = node_from[k][v]

  return (dist_to, node_from)                     ❾

dist_to and node_from will be two-dimensional structures. Each is a dictionary containing sub-dictionaries dist_to[u] and node_from[u].
❷ For each u row, initialize dist_to[u][v] to be infinity to reflect that each node, v, is initially unreachable.
❸ For each u row, initialize node_from[u][v] to be None to reflect that there may not even be a path from u to v.
❹ Make sure to set dist[u][u] = 0 to reflect that the distance from u to itself is 0.
❺ For each edge e = (u,v) from u, set dist_to[u][v] = weight of e to reflect that the shortest distance from u to v is the edge weight for e.
❻ Record that u is the last node on the shortest path from u to v. In fact, it is the only node on the path, which contains just the edge (u, v).
❼ Select three nodes—k, u, and v—and compute new_len, the total path length from u to k added to the path length from k to v.
❽ If new_len is smaller than the computed length of the shortest path from u to v, set dist_to[u][v] to new_len to record the shortest distance, and record that the last node on the shortest path from u to v is the last node on the shortest path from k to v.
❾ Return computed dist_to[][] and node_from[][] structures so actual shortest paths can be computed for any two nodes.

Listing 7-18. Code to recover the shortest path as computed by Floyd–Warshall

def all_pairs_path_to(node_from, src, target):    ❶
  if node_from[src][target] is None:
    raise ValueError('Unreachable')               ❼

  path = []
  v = targetwhile v != src:
    path.append(v)                                ❸
    v = node_from[src][v]                         ❹

  path.append(src)                                ❺
  path.reverse()                                  ❻
  return path

node_from[][] structure is needed to recover path from src to any target.
❷ To recover the full path, start at target.
❸ As long as v is not src, append v to path, a backward list of nodes found on path from src to target.
❹ Set v to become the prior node in the search as recorded by node_from[src][v].
❺ Once src is encountered, the while loop terminates, so src must be appended to complete the backward path.
❻ Reverse the list so all nodes appear in proper order from src to target.
❼ If node_from[target] is None, target is not reachable from src.

Clone this wiki locally