Boyer-Moore Majority Vote Algorithm -Counting Most Repeated Characters in O(n)

Efficient Search for the Most Frequent Character

Mar 10, 2024

In the realm of computer science and algorithms, finding the majority element in an array or string is a common problem. The Boyer-Moore Majority Vote Algorithm, also known as the Boyer-Moore Voting Algorithm, provides an elegant and efficient solution to this problem. In this article, we will delve into the algorithm's workings, explore its adaptation for finding the most repeated characters in a string, analyze its time complexity, and provide a practical implementation in Python.

Algorithm Overview

The Boyer-Moore Majority Vote Algorithm is based on the concept of maintaining a candidate element and a counter while iterating through the array or string. The algorithm follows these steps:

1. Initialize the candidate element to the first element and set the counter to 1.

2. Iterate through the remaining elements:

- If the counter becomes 0, update the candidate to the current element and set the counter to 1.

- If the current element matches the candidate, increment the counter by 1.

- If the current element differs from the candidate, decrement the counter by 1.

3. After the iteration, the candidate will be the majority element if it exists.

The intuition behind this algorithm is that the majority element will "cancel out" the other elements during the iteration process, leaving the majority element as the final candidate.

Adapting the Algorithm for Finding the Most Repeated Characters

To find the most repeated character in a string using the Boyer-Moore Majority Vote Algorithm, regardless of whether it appears more than ⌊n / 2⌋ times, we can follow these steps:

Apply the Boyer-Moore Majority Vote Algorithm to the string to find the candidate character.
Initialize a variable max_count to keep track of the maximum count and a variable most_repeated to store the most repeated character.
Iterate through each unique character in the string:
- Count the occurrences of the current character in the string.
- If the count is greater than, update max_count with the current count and most_repeated with the current character.
After the iteration, most_repeated the string will hold the most repeated character, regardless of its frequency.

Implementation in Python

https://github.com/raghavan/Boyer-Moore-Vote-Algorithm-with-Visualization

def most_repeated_character(string):
    candidate = boyer_moore_majority_vote(string)
    max_count = 0
    most_repeated = None

    for char in set(string):
        count = string.count(char)
        if count > max_count:
            max_count = count
            most_repeated = char

    return most_repeated

Example

Let's apply the algorithm to the string "abbbacaaabacab"

'a' becomes the candidate with a count of 1.
The count becomes 0 after 'b', so 'b' becomes the new candidate with a count of 1.
The count becomes 3 after encountering 'b' twice more.
The count becomes 2 after 'a'.
The count becomes 1 after 'c'.
The count becomes 0 after 'a', so 'a' becomes the new candidate with a count of 1.
The count becomes 4 after encountering 'a' three more times.
The count becomes 3 after 'b'.
The count becomes 2 after 'a'.
The count becomes 1 after 'c'.
The count becomes 2 after 'a'.
The count becomes 1 after 'b'.

The candidate character is 'a'. We count its occurrences, and it appears six times. Therefore, 'a' is the most repeated character in the string.

Time Complexity

One notable aspect of the Boyer-Moore Majority Vote Algorithm is its O(n) time complexity for finding the candidate character. However, when adapting the algorithm to find the most repeated character irrespective of its frequency, an additional pass is required to count the occurrences of each unique character. This increases the time complexity to O(n * k), where k is the number of unique characters in the string. Despite the increased complexity, the algorithm remains efficient for most practical scenarios.

Conclusion

The Boyer-Moore Majority Vote Algorithm is a powerful technique for finding the majority element in an array or string with an O(n) time complexity. By adapting the algorithm, we can also efficiently find the most repeated characters in a string. This algorithm finds applications in various domains, such as data analysis, string manipulation, and coding interviews.

For more practice and challenges, we recommend solving the following LeetCode problems,

Majority Element - https://leetcode.com/problems/majority-element/
Majority Element II - https://leetcode.com/problems/majority-element-ii/

Dev Den

Discussion about this post