Data Structures & Algorithms in Java – Graphs – Alien Dictionary.

Problem:

You are given a list of English words arranged lexically (as in a dictionary).

But not in the order of English alphabets but in the order of an alien language which uses English script.

With the given words you need to arrange the letters in the right order of the alien dictionary.

For example,

Given the input:

words = ["wrt","wrf","er","ett","rftt"]

The output is :

wertf

This is because if you observe the input, since the words are arranged in order let’s just pick the first letters of the words which are in order by the arrangement:

“w”,”w”,”e”,”e”,”r”

We have two “w”s two “e”s and one r.

Lets remove the duplicates

[w,e,r]

We have got one order here.

Now there are more letters in those words

We have to find their order as well.

Since the first two words both start with “w” we need to look at the second letter of these words then:

“wrt” , “wrf”

They both are same as well (r)

So you need to look at the next letter then.

It is “t” in the first word and “f” in the second word.

So t comes before f

We have a new order:

[t,f]

Similarly if you compare third and fourth words you get the info that r comes before t

(words : er and ett)

That is

[r,t]

So the groups of letters arranged in order are:

[w,e,r]

[t,f]

[r,t]

By looking at them we can say that the final order is:

[w,e,r,t,f]

We need to find out an algorithm to implement this.

Try out the solution here:

https://leetcode.com/problems/alien-dictionary/

Solution:

Hint:

Use BFS or DFS after representing the order in a graph.

Approach 1 : BFS

Let’s represent the order of the alphabets through a graph.

If a letter comes after another letter then we represent their association through an edge in the graph.

The letters are the nodes and the edges are the order between the nodes.

So in the above example, the graph will look like:

You can represent this graph through an adjacency list which consists of each node and its successors.

Once the adjacency list is prepared,

To do BFS,

You first pick up all those nodes which don’t have incoming edges.

This means there are no alphabets before them.

So they come first in the order.

In the above example there is only one such letter “w”.

You put those letters in a queue.

Then you pop them one by one and remove the edge from this node to their successors.

While doing so if any of the successors end up with no incoming edges , then you add them to the queue as well.

Continue doing this until there are no nodes with no incoming edges (in other words the queue is empty)

Each time you pop a letter append the letter to the output.

Data structures to use:

As already mentioned to represent the graph use an adjacency list.

To represent the incoming edges use a map with the letter as key and number of incoming edges to it as the value.

Every time you pop a letter from the queue , you will decrement this incoming edge value of its successors.

Here is the code to prepare the data structures:

  
        Map<Character,List<Character>> adjacencyList = new HashMap<>();
        
        Map<Character,Integer> indegrees = new HashMap<>();
        
        
        for(String word:words){
            
            for(Character c: word.toCharArray()){
                
                
                indegrees.put(c,0);
                
                adjacencyList.put(c, new ArrayList<Character>());
            }
        }
        
        
        
        for(int i=0;i<words.length-1;i++){
            
            
            String word1 = words[i];
            String word2 = words[i+1];
            
            
            if(word1.length() > word2.length() && word1.startsWith(word2)){
                
                return "";
            }
            
            int minWordLen = Math.min(word1.length(),word2.length());
            for(int j=0;j<minWordLen;j++){
                
                
                if(word1.charAt(j) != word2.charAt(j)){
                    
                    
                    adjacencyList.get(word1.charAt(j)).add(word2.charAt(j));
                    
                    indegrees.put(word2.charAt(j),indegrees.get(word2.charAt(j))+1);
                    
                   break;
                }
            }
            
            
        }

We return empty string for one of the boundary conditions while preparing the adjacency list.

If a word is followed by another word which is just the previous word’s prefix, example (“catwalk” followed by “cat”) then the dictionary is invalid as “cat” should be followed by “catwalk” in a proper dictionary.

To do BFS you use one more data structure , the Queue.

Here is the initial queue :

  Queue<Character> queue = new LinkedList<Character>();
        
        for(Character c:indegrees.keySet()){
            
            
            
            if(indegrees.get(c) == 0){
                
                queue.add(c);
                
            }
        }

For doing BFS:

  • Pop the top element
  • Add it to output
  • For each successor , reduce the incoming edges size by 1
  • If the successor’s incoming edges size becomes 0 add it to the queue

Here is the BFS logic:

StringBuilder output = new StringBuilder();
        while(!queue.isEmpty()){
            
           Character c = queue.poll();
            
            
            output.append(c);
            
            
            for(Character a:adjacencyList.get(c)){
                
                
                indegrees.put(a,indegrees.get(a)-1);
                
                
                if(indegrees.get(a) == 0){
                    
                    queue.add(a);
                }
            }
            
        }
        

After doing BFS if the number of unique letters don’t match the number of letters in the output, then it means the graph has a cycle so the dictionary is invalid and you return an empty string.

This is because the incoming edges size for a cyclic node will not become 0 and it will not be picked up in the BFS search.

if(output.length() < indegrees.size()) return "";

The entire code:

class Solution {
    public String alienOrder(String[] words) {
        
        
        
        
        Map<Character,List<Character>> adjacencyList = new HashMap<>();
        
        Map<Character,Integer> indegrees = new HashMap<>();
        
        
        for(String word:words){
            
            for(Character c: word.toCharArray()){
                
                
                indegrees.put(c,0);
                
                adjacencyList.put(c, new ArrayList<Character>());
            }
        }
        
        
        
        for(int i=0;i<words.length-1;i++){
            
            
            String word1 = words[i];
            String word2 = words[i+1];
            
            
            if(word1.length() > word2.length() && word1.startsWith(word2)){
                
                return "";
            }
            
            int minWordLen = Math.min(word1.length(),word2.length());
            for(int j=0;j<minWordLen;j++){
                
                
                if(word1.charAt(j) != word2.charAt(j)){
                    
                    
                    adjacencyList.get(word1.charAt(j)).add(word2.charAt(j));
                    
                    indegrees.put(word2.charAt(j),indegrees.get(word2.charAt(j))+1);
                    
                   break;
                }
            }
            
            
        }
   
        
        Queue<Character> queue = new LinkedList<Character>();
        
        for(Character c:indegrees.keySet()){
            
            
            
            if(indegrees.get(c) == 0){
                
                queue.add(c);
                
            }
        }
        
        StringBuilder output = new StringBuilder();
        while(!queue.isEmpty()){
            
           Character c = queue.poll();
            
            
            output.append(c);
            
            
            for(Character a:adjacencyList.get(c)){
                
                
                indegrees.put(a,indegrees.get(a)-1);
                
                
                if(indegrees.get(a) == 0){
                    
                    queue.add(a);
                }
            }
            
        }
        
        if(output.length() < indegrees.size()) return "";
        return output.toString();
        
    }
}

Algorithm:

  1. Represent the order between the letters in the words as a graph through an adjacency list
  2. Create a map of incoming edges size for each letter
  3. Put all the nodes who don’t have an incoming edge to a queue
  4. Do BFS on the queue as earlier mentioned

Time complexity:

To initialize the initial incoming degree map and the adjacency list , we iterate through all the characters in all the words.

So if the total number of characters (including all the words) is denoted by C,

then the time complexity is O(C).

Then we iterate through each word and prepare the adjacency list and the incoming edges size map.

If the number of words is N , then the time complexity is O(N).

To do BFS , we iterate through each node (unique letters) and each edge (entry in adjacency list).

So if number of unique letters is U and number of edges is E.

Time complexity is O( U + E).

Notice that the number of edges is equal to the number of words , when we populated the adjacency list.

So time complexity can be mentioned as O( U + N)

Total time complexity is O( C + N + U + N ).

Since C is much bigger among these , we can consider the time complexity to be O(C).

Space complexity:

The adjacency list holds all the unique letters and its edges , so we can say time complexity is O( U + E)

Since number of edges is equal to number of words given , we can denote it as O( U + N) as well.

If the number of unique letters are the 26 letters only from the English alphabet , then we can assume space complexity is a constant O(1)

Approach 2 – DFS:

In DFS instead of preparing a map of incoming edges , we will use a bitmap to keep track of visited nodes.

While doing DFS we will visit the bottommost node and add it to the output first .

Since the bottom most output represents the last letters we will finally reverse the output to get the desired result.

Algorithm for DFS:

  1. Check if a node is already visited AND its successors have not been processed, if so return false (this represents a cycle in the graph)
  2. If a node is visited AND its successors have been processed return true
  3. Mark the node as visited
  4. For each successor do DFS
  5. Mark the node and its successors as visited
  6. Add the node to the output

Here is the code for DFS:

    
    
    public boolean dfs(Character c){
  
        
        //check if node and its successors are already visited
        //if just node is visited this will return false (also represents a cycle)
       //if node and successors are visited this will return true
        if(seen.containsKey(c)) return seen.get(c);
        
        
        
//mark node as visited
        seen.put(c,false);
            
            
            for(Character a:rList.get(c)){
                
                
                boolean result = dfs(a);
                
                if(!result) return false;
            }

//mark node and its descendants as visited
          seen.put(c,true);
        //append node to output
        output.append(c);
        
        
        return true;
        
        
    
}

The above code might be difficult to get at first glance , will explain this shortly.

Here is the entire code:

class Solution {
    
    
    StringBuilder output = new StringBuilder();
    Map<Character,List<Character>> rList = new HashMap<>();
    
    Map<Character,Boolean> seen = new HashMap<>();
    public String alienOrder(String[] words) {
       
// initialize adjacency list
        for(String word:words){
            
            for(char c: word.toCharArray()){
            
                rList.putIfAbsent(c, new ArrayList<>());
                
                
            }
        }
        
        
       //populate adjacency list 
        for(int i=0;i<words.length-1;i++){
            
            
            String word1 = words[i];
            String word2 = words[i+1];
            
            
            if(word1.length() > word2.length() && word1.startsWith(word2)){
                
                return "";
            }
            
            int minWordLen = Math.min(word1.length(),word2.length());
            for(int j=0;j<minWordLen;j++){
                
                
                if(word1.charAt(j) != word2.charAt(j)){
                    
                    
                    rList.get(word1.charAt(j)).add(word2.charAt(j));
                    
                    
                    
                   break;
                }
            }
            
        }
            
// do DFS for each node
            for(Character c:rList.keySet()){
                
                
              boolean success=   dfs(c);
                
                if(!success) return "";
            }
  
        //reverse the output since DFS returns the last node first
        return output.reverse().toString();
        
    }
    
    
    public boolean dfs(Character c){
  
        
        //check if node and its successors are already visited
        //if just node is visited this will return false (also represents a cycle)
       //if node and successors are visited this will return true
        if(seen.containsKey(c)) return seen.get(c);
        
        
        
//mark node as visited
        seen.put(c,false);
            
            
            for(Character a:rList.get(c)){
                
                
                boolean result = dfs(a);
                
                if(!result) return false;
            }

//mark node and its descendants as visited
          seen.put(c,true);
        //append node to output
        output.append(c);
        
        
        return true;
        
        
    }
}

Let’s see how the DFS we do here is different from the usual DFS done to mark visited nodes.

Lets go step by step.

In usual DFS , you first mark a node as visited and then do DFS for other descendants like this:

    public void dfs(Character c){
  
        
        if(seen.get(c)) return;
    
            seen.put(c,true);
            
            
            for(Character a:rList.get(c)){
                
                
                dfs(a);
                
                
            }
            
           
    }

But in our case we want the last node first.

So we move the “setting visited flag to true” to the last after dfs call is made for a node’s descendants:

    public void dfs(Character c){
  
        
        if(seen.get(c)) return;
    

            
            
            for(Character a:rList.get(c)){
                
                
                dfs(a);
                
                
            }
             seen.put(c,true);
           
    }

Also we want to consider three cases :

  1. Is a node visited but its descendants not visited – this means the DFS got into a loop and could not reach the descendants
  2. Is a node visited and its descendants visited – this means there is no cycle and also since the node is already visited we exit here
  3. A node is not visited

We will use our bitmap seen to consider all the cases

    Map<Character,Boolean> seen = new HashMap<>();

If a node is not visited at all , we wont set that key and value in the seen map(initially the map doesnt have any key and values , so by default this is the case)

If a node is visited but its descendants are not , we will mark it as false:

seen.put(c,false);

Once we visit the descendants as well , we will mark this as true:

seen.put(c,true);

Since DFS is a recursive call we need base conditions to return.

We will include these base conditions:

  • If the node for which we are doing DFS is already visited but its descendants or not return false (this means a cycle , that is why DFS couldnt reach the leaf node)
  • If the node for which we are doing DFS and its descendants are visited return true

So instead of just returning the recursive call we are returning for two different conditions.

So we will change our DFS call to return a boolean value:

 public boolean dfs(Character c){
  
        
        
        if(seen.containsKey(c)){

                //node is visited but its descendants or not
                 if(seen.get(c) == false) return false;
                //node and its descendants are visited
                 if(seen.get(c) == true) return true;
           //above two can be combined into one line - return seen.get(c);
        }
        
//mark the node as visited
        seen.put(c,false);
            
            //start DFS for the node
            for(Character a:rList.get(c)){
                
                
                boolean result = dfs(a);
                //check the result for each recursive call
                if(!result) return false;
            }
//mark the node and its descendants as visited (since this is made after the //recursive call above, this will be set after all descendants of a node are //visited
            seen.put(c,true);
        
//finally append the letter to the output , (we have reached the leaf node)
        output.append(c);
        
        
        return true;
        
        
    }

Time complexity and Space complexity is the same as that of BFS :

O(C) and O(U + N ) / O(1) respectively,

where C is the total number of characters

N is the total number of words/ edges

U is the number of unique letters

That’s it!


Posted

in

, , ,

by

Comments

Leave a Reply

Discover more from The Full Stack Developer

Subscribe now to keep reading and get access to the full archive.

Continue reading