Data Structures & Algorithms in Java – Strings – Minimum Window Substring

Problem:

Given two strings s and t , find the minimum window substring from s which contains all the characters of the string t.

For example,

For the input strings

s = “ADOBECODEBANC”

t = “ABC”

The minimum window substring in s is “BANC”

The possible substrings are “ADOBEC” , “BECODEBA” , “CODEBA” and “BANC”

All the substrings have all the characters of the string “ABC”

“BANC” is the smallest.

Try out the solution here:

https://leetcode.com/problems/minimum-window-substring

Advertisements

Solution:

Hint:

Use sliding window pattern

Have two pointers one pointing to the start of the window and another to the right. Find the windows using these pointers and keep track of the minimum of these windows (using the pointers)

First lets find the solution without considering repeated characters in the window and the target string as well. This will keep the solution simple . Then we will add the logic to consider them.

ALGORITHM:

STEP 1: Initialize variables to represent the start and end of the sliding window , the minimum of the left and right pointers , an array to store the frequency of characters in the second string , a counter to keep track the total number of matching characters in the sliding window and a variable to keep track of the minimum window length (initialize this to a higher value like a length more than the source string or to Integer.MAX_VALUE in java)

STEP 2: Fill the frequency array with the frequency of characters in the target string

STEP 3: Point the right counter to the first character of the source string and iterate till the end of the string. During each iteration follow the below steps

STEP 3 a: If the current character matches any of the characters in the frequency array increment the counter variable

STEP 3 b: If the counter length matches the length of the target string then it means we have found a sliding window.

STEP 3c: Find if the length of the current window is less than the length of the previous windows , If so store the left and right pointers.

STEP 3d: Find the next window by reducing the count of the starting character of the window (since it will be no longer part of the next window) and then incrementing the left pointer

STEP 3e: Now the counter length gets less than the window so come out of the loop and increment the right pointer

STEP 3f: Continue from STEP 3 a for the new right character.

STEP 4: Using the minimum left and right pointers return the minimum window substring , if there are no window substrings , return an empty string.

Code:

class Solution {
    public String minWindow(String s, String t) {
        
        
        int left = 0;
        
        int right = 0;
        
        int minLeft = 0;
        
        int minRight = 0;
        
        int[] alphabets = new int[58];
        

        
        for(char c:t.toCharArray()){
            
            alphabets[c -'A'] ++;
            
       
        }
        
        
        
        
        int count =0;
        
        int min = s.length()+1;
   
        while(right < s.length()){
      
            if(alphabets[s.charAt(right) - 'A'] !=0 ){

                count++;

            }
    
            while(count == t.length()){

                if(right - left < min){
                
                minLeft = left;
                
                minRight = right;
                    
                    min = right - left;
                    
                }
                
                if(alphabets[s.charAt(left) - 'A'] != 0){
       
                count--;
              
                }
                
         
                left++;
                
                
            }
            
            right++;
        }
        
        
        
        return min < s.length()+1?s.substring(minLeft,minRight+1):"";
      
    }
}

In the above code , we use an array of length 58 to store the alphabets because Java stores characters of the alphabets starting from unicode 65 for upper case letters until 90 and then from 97 to 122 for lower case letters. So totally the range is from 65 to 122 which has the length 58. We will start from the index 0 by subtracting the unicode length of each character with the first character unicode (65) . When you do (character – ‘A’) Java automatically converts them to unicodes , subtracts the unicodes and returns the corresponding character.

To check if a character is in the target array we just check if the frequency of the current character is not equal to zero which means the character is present in the target string.

The above code will not work if the target string has duplicate characters like “ABBC” or even might not work if the source window has duplicate characters in a single window.

To mitigate this we will use another frequency array which is going to be dynamic.

Every time you add a character from the target string in a window , we will reduce its frequency from this array .

Every time we remove a character from the target string in a window , we will increase its frequency back again.

This way we will prevent adding more duplicates than that present in the target string.

We will also prevent reducing the count of characters in the window when we face duplicates .

Advertisements

Here is the updated code:

class Solution {
    public String minWindow(String s, String t) {
        
        
        int left = 0;
        
        int right = 0;
        
        int minLeft = 0;
        
        int minRight = 0;
        
        int[] alphabets = new int[58];
        
        int[] currentFreq = new int[58];
        
        for(char c:t.toCharArray()){
            
            alphabets[c -'A'] ++;
            
            currentFreq[c - 'A']++;
        }
        
        
        
        
        int count =0;
        
        int min = s.length()+1;
      
        
        while(right < s.length()){
            
            
            
            if(alphabets[s.charAt(right) - 'A'] !=0 ){
                
                
                currentFreq[s.charAt(right) - 'A']--;
                
                if(currentFreq[s.charAt(right) - 'A'] >= 0){
                count++;
                }
            }
            
            
            
            while(count == t.length()){
                
                
             
                if(right - left < min){
                
                minLeft = left;
                
                minRight = right;
                    
                    min = right - left;
                    
                }
                
                if(alphabets[s.charAt(left) - 'A'] != 0){
                    
                    
                    currentFreq[s.charAt(left) - 'A'] ++;
                    
                    
                    if(currentFreq[ s.charAt(left) - 'A'] > 0){
                    
                count--;
                        
                    }
                    
                }
                
         
                left++;
                
                
            }
            
            right++;
        }
        
        
        
        return min < s.length()+1?s.substring(minLeft,minRight+1):"";
      
    }
}

When you increment the counter to consider the current character in the target string , we use the below logic:


            if(alphabets[s.charAt(right) - 'A'] !=0 ){
                
                
                currentFreq[s.charAt(right) - 'A']--;
                
                if(currentFreq[s.charAt(right) - 'A'] >= 0){
                count++;
                }
            }

As you see , we check if the character is part of the target string using the first if check.

And then we decrement the current frequency of that particular character.

This is because when we face the same character again more than the number of times present in the target string , we don’t want to increment the count as it is a duplicate.

Similarly when we remove a character from the current window to find the next window we do this logic:


              
                if(alphabets[s.charAt(left) - 'A'] != 0){
                    
                    
                    currentFreq[s.charAt(left) - 'A'] ++;
                    
                    
                    if(currentFreq[ s.charAt(left) - 'A'] > 0){
                    
                          count--;
                        
                    }
                    
                }
                

Here again we first check if the current character is part of the target string.

Then we increment the frequency of the current character as we had decremented it while adding. If it is more than zero after incrementing then it means it can be considered as part of the next window so we decrement the count as well .

Thus the currentFreq counter is used dynamically to keep track of the duplicates.

Advertisements

Let’s check it using an example

Consider the below strings:

s = “cabwefgewcwaefgcf”
t = “cae”

We need to track windows containing the string “cae”.

cabwefgewcwaefgcf

First window (hghlighted in bold italic):

cabwefgewcwaefgcf

Second window:

cabwefgewcwaefgcf

In the second window there are two “e” . We don’t consider the second e in the count because the target string has only one “e” .

So currentFreq[“e”] = 1

If we dont use the extra logic we have been discussing , the current window would have stopped at the second “e” because the “counter” value would be 3 while the right pointer reaches it (equal to the target string) and our window would be :

cabwefgewcwaefgcf

It doesn’t have the character “c” !

So we execute the below logic.

When we first face the character e , we decrement its frequency:

currentFreq[“e”]–;

which results in

currentFreq[“e”] = 0

It is 0 , since we have the check currentFreq[“e”] >= 0 , we consider this e and increment the counter.

But when we face the next e , we decrement its frequency:

currentFreq[“e”]–

which results in

currentFreq[“e] = -1

It is less than 0 , so we don’t increment the count and thus it is not considered as part of the window.

Similary,

When we move to the next window from current window , we decrement the count of the e only once even if it is present more than once.

Consider this window:

cabwefgewcwaefgcf

At this stage currentFreq[“e”] = -1

So when we move from the above window , the first e is left out and

we increment the above frequency counter:

currentFreq[“e”] = 0 now

but we don’t decrement the window length counter :

cabwefgewcwaefgcf

This is because even though the character “e” which is part of the target string was left out , there is still one more “e” present in the window. So the window substring is still valid and so we don’t decrement the counter variable.

When we move out of the next e though the frequency becomes

currentFreq[“e”] = 1

which is greater than 0 and so we decrement the window counter. Window now is:

cabwefgewcwaefgcf

Once the window is processed , the current frequency value automatically resets to the original value and we can use it again for the next window!

This solution is one among the fastest in leetcode:

That’s it!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s