Data Structures & Algorithms in Java – Heap – Top K Frequent Elements

Problem:

Given an array of numbers , find the k most frequent numbers .

For example ,

Given the array:

[1,1,1,2,2,3,4,5]

and the value k = 2

The output is

[1,2]

Because these are the most frequent numbers.

1 occurs thrice and 2 occurs twice.

Rest other numbers occur only once.

So the most frequent 2 numbers are 1 and 2.

Try out the solution here:

https://leetcode.com/problems/top-k-frequent-elements/

Advertisements

Solution:

There are many approaches to this problem.

One approach is to store the frequency of each element in a hash map.

Sort the map according to the highest frequency.

And return the top k elements.

This takes a time complexity of O(nlogn)

(Sorting takes place in O(nlogn) , storing the frequencies in map takes O(n))

In total O(nlogn).

We can improvise on this.

Let’s look at 3 different approaches which have better time complexity.

Approach 1: Using min heap

Instead of sorting the list as explained before , you can use a min heap instead of size k.

So when you parse the map containing the collection of frequencies you can store the elements on the heap based on their frequencies.

So the least frequent element stays on top always.

Also always maintain the size of k .

So if you need to add an element after the size , pop the topmost element (which is the least frequent) and then add the new element.

So finally the min heap contains the k most frequent elements.

You can pop them one by one and store in an array in the descending order and return.

Here is the algorithm:

STEP 1: If the number of elements in the array is equal to k , then return the array itself

STEP 2: Store the elements and their frequencies in a hash map

STEP 3: Create a min heap (priority queue) based on the frequency values

STEP 4: Add each key in the map to the queue until the size k is reached

STEP 5: After the size k pop the topmost element and add the next one

STEP 6: Pop each element from the min heap and store it an array in descending order and return it

Here is the code:

Code:

class Solution {
    public int[] topKFrequent(int[] nums, int k) {
        
        
        if(nums.length == k) return nums;
        
        var map = new HashMap<Integer,Integer>();
        
        
        for(int i:nums){
            
            
            map.put(i,map.getOrDefault(i,0)+1);
        }
        
        
        Queue<Integer> heap = new PriorityQueue<>(
        
           (a,b) -> {
               
               return map.get(a) -  map.get(b);
           }
        );
        
        
        for(int key:map.keySet()){
            
            
            heap.add(key);
            
            if(heap.size() > k){
                
                heap.poll();
            }
        }
        
        
        var output = new int[k];
        for(int i=k-1;i>=0;i--){
            
            output[i] = heap.poll();
            
        }
        
        return output;
    }
}

Time complexity :

For storing in hashmap , it is O(n) where n is the number of elements

For adding it to heap ,

Each heap operation (poll/arrange in sorted order) takes logk time where k is the size of the heap.

So total time complexity for creating a heap of size k is O(nlogk) since we process each of the n elements at this step.

Finally adding the elements in the heap takes O(klogk) : O(logk) for each of the k elements (for the pop operation) in the heap.

Space complexity:

We use an extra heap of size k and a hash map of size n .

So space complexity is O(n+k) which can be considered as O(n)

Advertisements

Approach 2: Using QuickSort / Partition Sorting

The above approach takes O(nlogk) time complexity.

You can improve this complexity to O(n) by using quick sort / partition sort instead of min heap.

In this approach you still store the frequencies in a hash map.

Then you pick a random element from the hash map.

You check each element in the hash map and push the less frequent elements compared to the pivot element’s frequency to its left and the more frequent elements to the right.

Then you check if the pivot is at the position (n-k) where n is the total number of elements.

This means the elements are sorted in frequency and the last n-k elements have the highest frequency – top k elements

Return the last k elements.

If the pivot is less than n-k elements , then the pivot needs to be pushed forward so do the partition search again in the elements to the right of the pivot.

If the pivot is greater than n-k elements , then it needs to be pushed to the left so do the partition search again in the elements to the left of the pivot.

Do the above two steps until the pivot comes to the position we want.

Here is the algorithm:

STEP 1: If the number of elements is equal to k , then return the array

STEP 2: Store the frequency of the elements in a hash map

STEP 3: Store all unique elements in the input array in another array.

STEP 4: Do quick sort using the new pivot on the array of unique elements as below:

STEP 4a: If there is only one element in the array then return

STEP 4b: Choose a random index as the pivot index

STEP 4c: Do partitioning on the unique array using this pivot index as below:

STEP 4c1: Swap the pivot with the rightmost element

STEP 4c2: Compare each element with the rightmost element and if they are less frequent then swap them with the current element (The current element is the element pointed to by a pointer which points to the latest less frequent element)

STEP 4c3: Swap the right most element which is the pivot with the current element. The pivot is in its correct position now.

STEP 4c4: Return the pivot index.

STEP 5: If the pivot index is equal to n-k then we have found the top k most frequent elements which are the last k elements in the unique array.

STEP 6: If the pivot index is greater than n-k then it means we want a pivot which is lesser so do the quick sort again for the elements to the left of the pilot with a random pivot chosen from those elements.

STEP 7: If the pivot index is smaller than n-k then it means we want a pivot which is greater and to the right of the current pivot so do the quick sort again for the elements to the right of the pilot with a random pivot chosen from those elements.

STEP 8: Return the last k elements in the unique array.

Here is the code:

Code:

class Solution {
   
    
    Map<Integer,Integer> freqMap = new HashMap<Integer,Integer>();
    
    int[] unique;
    
    public int[] topKFrequent(int[] nums, int k) {
        
 
        
        for(int i:nums){
            
            
            freqMap.put(i,freqMap.getOrDefault(i,0) + 1);
        }
        
        
        int totalElements = freqMap.size();
        unique = new int[totalElements];
        
        
        int index = 0;
        for(int key:freqMap.keySet()){
            
            unique[index]=key;
            
            index++;
        }
        
       
        quicksort(0,totalElements-1,totalElements-k);
        
        return Arrays.copyOfRange(unique,totalElements-k,totalElements);
        
        
    }
    
    
    public void quicksort(int left,int right,int kthsmallest){
           if(left == right) return;
        
        Random r = new Random();
        
        int pivot = left + r.nextInt(right-left);
        
     
        
        
        pivot = partition(left,right,pivot);
        
        
        if(pivot == kthsmallest){
            
            
            return;
        }else if(pivot > kthsmallest){
            
            quicksort(left,pivot-1,kthsmallest);
        }else{
            
            
            quicksort(pivot+1,right,kthsmallest);
        }
        
        
    }
    
    
    
    public int partition(int left,int right,int pivot){
        
        
        int pivotFreq = freqMap.get(unique[pivot]);
        
        swap(pivot,right);
        
        
        int start = left;
        
        
        
        for(int i=left;i<=right;i++){
            
            
            if(freqMap.get(unique[i]) < pivotFreq){
                
                
                swap(start,i);
                
                start++;
            }
        }
        
        
        swap(start,right);
        
        return start;
    }
    
    
    public void swap(int a,int b){
        
        
        int t = unique[a];
        
        unique[a] = unique[b];
        
        unique[b] = t;
    }
    
}

Time complexity:

Storing the elements in a hash map takes O(n) time.

Storing the unique elements in an array takes O(n) time.

Quick sort takes O(n) time

So time complexity is O(n)

Space complexity:

The extra hash map takes O(n) space

The extra array of unique elements take O(n) space

So space complexity is O(n)

Advertisements

Approach 3: Bucket sort

Instead of quick sort you can use a simpler bucket sort with the same O(n) time complexity.

You just need an extra data structure to store elements according to their frequency.

In this approach,

you create a bucket for each frequency.

For n elements the frequency cannot be more than n so you create n buckets each representing each frequency. Since indexing starts at 0 in java we create n+1 buckets and will be using buckets 1 to n.

Here too we will be calculating the frequency of each element and store that it in a hash map.

Then for each key in the hash map we check its frequency and put it in the corresponding bucket.

The buckets are in increasing order of frequency.

So starting from the last bucket you collect the elements in an array.

The top k elements in this array represent the top k most frequent elements

ALGORITHM:

STEP 1: If number of elements is equal to k return the array as such.

STEP 2: Store the elements and their frequencies in a hash map

STEP 3: Create a list of buckets as much as the number of elements in the input. Each bucket contains a list to store the elements whose frequency is represented by the index of the bucket

STEP 4: For each key in the map , check the frequency and store it in the corresponding bucket.

STEP 5: Collect elements from the last bucket and store it in an array

STEP 6: Return the first k elements from the array created in the previous step.

Code:

class Solution {
   
    
    Map<Integer,Integer> freqMap = new HashMap<Integer,Integer>();
    
    
    
    public int[] topKFrequent(int[] nums, int k) {
        
 
        if(nums.length == k) return nums;
        
        for(int i:nums){
            
            
            freqMap.put(i,freqMap.getOrDefault(i,0) + 1);
        }
        
        
        int totalElements = freqMap.size();
        
        
        
        
        List<Integer>[] buckets = new List[nums.length+1];
        
       
        
        for(int i=0;i<buckets.length;i++){
            
            buckets[i] = new ArrayList<>();
        }
        
        
        for(int key:freqMap.keySet()){
            
           int frequency = freqMap.get(key);
      
            buckets[frequency].add(key);
        }
        
        
        
        
        List<Integer> bucketElements = new ArrayList<Integer>();
        for(int i=buckets.length-1;i>=0;i--){
            
            if(buckets[i]!=null)
            bucketElements.addAll(buckets[i]);
        }
        
        
        
        
        int[] output = new int[k];
        
    for(int i=0;i<k;i++){
         output[i] = bucketElements.get(i);
    }
        return output;
        
    }
    
}

Time complexity:

Storing in hash map – O(n)

Creating n buckets with an empty list- O(n)

Storing elements in the buckets – O(n)

Storing elements in the output array – O(k)

Total – O(n)

Space complexity:

O(n) for all the additional storage used in proportion to the size of the input (hash map , bucket list , output array)

That’s it!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s