References

  • https://leetcode.com/problems/connecting-cities-with-minimum-cost/
  • https://www.youtube.com/watch?v=GFsPyTdCS1I

There are n cities numbered from 1 to n.

You are given connections, where each connections[i] = [city1, city2, cost] represents the cost to connect city1 and city2 together.  (A connection is bidirectional: connecting city1 and city2 is the same as connecting city2 and city1.)

Return the minimum cost so that for every pair of cities, there exists a path of connections (possibly of length 1) that connects those two cities together.  The cost is the sum of the connection costs used. If the task is impossible, return -1.

Example 1:

Input: n = 3, connections = [[1,2,5],[1,3,6],[2,3,1]]
Output: 6
Explanation: 
Choosing any 2 edges will connect all cities so we choose the minimum 2.

Example 2:

Input: n = 4, connections = [[1,2,3],[3,4,4]]
Output: -1
Explanation: 
There is no way to connect all cities even if all edges are used.

Note:

  1. 1 <= n <= 10000
  2. 1 <= connections.length <= 10000
  3. 1 <= connections[i][0], connections[i][1] <= n
  4. 0 <= connections[i][2] <= 105
  5. connections[i][0] != connections[i][1]
Solution

Approach 1: Minimum Spanning Tree (Using Kruskal’s algorithm)

Intuition

If we model the cities and connections as a graph, each connection is an edge (undirected) and each city is a node of the graph. We need to find a subset of edges which connects all the nodes of the graph with the minimum possible total weight. This is by definition the Minimum Spanning Tree or MST of a graph.

Algorithm

There are a variety of algorithms that we can use to obtain the MST of a graph. We will use Kruskal’s algorithm here, which is implemented using the Disjoint set Union-Find data structure.

In order to obtain the MST using Kruskal’s algorithm, we first sort all the connections (edges) present in the graph based on their weights (in increasing order) and will iterate over them one by one. The objective here is to greedily pick edges that will help us to connect more nodes in the graph. Each time we find a new edge which does not result in a cycle with the edges selected so far, we add that edge in the final MST. We keep doing this till we have obtained the MST which connects all the nodes in the graph (i.e. connects all the cities using the connections).

Disjoint-set union find can be implemented in a couple of ways. A plain union find is shown below which keeps the track of the parent of each node (initially parent of i is set to itself, i.e. i) and performs the union and find using a helper method getRoot.

''' Vanilla Disjoint-Set Union Find '''
class UnionFind:   
    def __init__(self, N):
        self.parents = [i for i in range (N + 1)]
        
    def union(self, x, y):
        root1, root2 = self.find(x), self.find(y)     
        if root1 == root2:
            return       
        self.parents[root2] = root1
        
    def isInSameGroup(self, x, y):
      	return self.find(x) == self.find(y)
      
    def find(city):
      	# Traverse all the way to the top (root) going through the parent nodes
      	while city != parents[city]:
        	city = parents[city]
        return parents[city]        
        
    

The above implementation can be made faster by incorporating Path compression. Here, while obtaining the root, we compress the path by assigning the grandparent of the node as the parent (thereby skipping connections and moving nodes closer to the root). We modify the Find method to implement path compression.

def find(city):
  	while city != parents[city]:
      	# Path compression
        # a's grandparent is now a's parent
    	parents[city] = parents[parents[city]]
    return parents[city]

This can be made even faster using a technique known as Weighted Union. In this technique, in addition to the parent nodes, we also keep the weights of each of the nodes. Every time we take union, the root node with more weight (i.e. having more elements in the corresponding set) is used as the parent node of the other node. We initialize the weight corresponding to each node as 1 initially, as each element belongs to it’s own set in the beginning. Below is the implementation of this idea (we modify Union method).

class UnionFind:   
    def __init__(self, N):
        # Set the initial parent node to itself
        self.parents = [i for i in range (N + 1)]
        # Initialize weight for each node to be 1
        # Used to store weights of each nodes
        self.weights = [i for i in range (N + 1)] 
       
    # Modify Union method to incorporate weighted compression  
    def union(self, a, b):
        rootA, rootB = self.find(a), self.find(b)    
        # If both a and b have same root, 
        # i.e. they both belong to the same set, hence we don't need to take union
        if root1 == root2:
            return       
        # Weighted union
        self.parents[root2] = root1
        if self.weights[rootA] > self.weights[rootB]:
            # In case rootA is having more weight
            # 1. Make rootA as the parent of rootB
            # 2. Increment the weight of rootA by rootB's weight
            self.parents[rootB] = rootA
            self.weights[rootA] += self.weights[rootB]
        else:
            # Otherwise
            # 1. Make rootB as the parent of rootA
            # 2. Increment the weight of rootB by rootA's weight
            self.parents[rootA] = rootB
            self.weights[rootB] += self.weights[rootA]
        
    def isInSameGroup(self, x, y):
      	return self.find(x) == self.find(y)
      
    def find(city):
      	# Traverse all the way to the top (root) going through the parent nodes
      	while city != parents[city]:
        	city = parents[city]
        return parents[city]      
      
class Solution:
    def minimumCost(self, n: int, connections: List[List[int]]) -> int:

        unionfind = UnionFind(n)
        # Sort connections based on their weights (in increasing order)
        connections.sort(key=lambda x: x[2])
        # Keep track of the total cost of adding all those edges
        total = 0
        for city1, city2, cost in connections: 
            if not isInSameGroup(city1, city2):
                union(city1, city2)
                total += cost
        # Check that all cities are connected.
        root = find(n)
        return total if all(root == find(city) for city in range(1, n+1)) else -1

If we combine both Path compression and Weighted Union, it takes \log^{\ast} Nlog∗N for the union and find operations in case of Disjoint-set union link. Here \log^{\ast} Nlog∗N is an extremely slow-growing inverse Ackermann function a.k.a Iterated logarithm and practically does not exceed 5. Hence it can be treated as a constant for implementation purposes.

We can combine all the concepts we have seen above in order to implement Kruskal’s algorithm for obtaining MST of a graph. Below is the implementation of this.

class Solution:
    def minimumCost(self, N: int, connections: List[List[int]]) -> int:
        
        if (L := len(connections)) < N-1: return -1
        parent = list(range(0, N+1))
        rank = [1] * (N+1)
        
        # by Kruskal
        def find(x):
            # nonlocal parent
            if parent[x] != x:
                parent[x] = find(parent[x])
            return parent[x]
            
        connections.sort(key = lambda x: x[2])
        
        # we have to find N-1 edges
        i = j = cost = 0
        while i < N-1:
            u, v, w = connections[j]
            if find(u) != find(v): 
                cost += w
                u, v = parent[u], parent[v]
                if rank[u] < rank[v]:
                    parent[u] = parent[v]
                    rank[v] += rank[u]
                else:
                    parent[v] = parent[u]
                    rank[u] += rank[v]
                i += 1
            j += 1
        return cost

Complexity Analysis

  • Time complexity: Assuming $N$ to be the total number of nodes (cities) and $M$ to be the total number of edges (connections). Sorting all the $M$ connections will take $O(M \cdot \log M)$. Performing union find each time will take $\log^{\ast} N$ (Iterated logarithm). Hence for $M$ edges, it’s $O(M \cdot \log^{\ast} N)$ which is practically $O(M)$ as the value of iterated logarithm, $\log^{\ast} N$ never exceeds 5.
  • Space complexity: $O(N)$, space required by parents and weights.

Leave a reply