# Boyer Moore Algorithm for Pattern Searching (C++, java, python)

Every year thousands of students are graduating but the placement strategies of recruiters have varied completely. This makes the students work harder to get jobs in tech companies. We understand this hustle, and that's why we bring to you the most informative tech blogs to ace your placement interviews.

Today's blog is about pattern matching using a new algorithm called Boyer-Moore Algorithm. Here, we'll gain in-depth knowledge about the algorithm and the strategies used, along with their implementation in C++, Python, and Java. So, let's get started!

## What is Boyer-Moore Algorithm?

In the year 1977, Robert Boyer and J Strother Moore established an algorithm that proved to be very effective for the various pattern-searching problems in the computing world. They named it the Boyer-Moore algorithm which has served as the benchmark for pattern-searching algorithms ever since.

Unlike the traditional way of pattern searching where we try to match the two strings in a forward manner, the Boyer-Moore advances the concept by beginning to match the last character of the string to be searched. In this way, the time complexity is reduced significantly. As mentioned earlier, this algorithm takes the backward approach by aligning the pattern string P with text string T and thereafter, comparing the characters from right to left, starting with the rightmost character. It works on the principle that if a mismatched character is found, then there is no use in matching the other characters in the two strings.

To work effectively, there are two strategies/approaches we use in this algorithm:

2. Good Suffix Heuristics

In the following, we have explained in detail both the strategies along with their implementation in C++, Java, and Python.

This intuition behind solving the pattern-matching problem is quite simple. Just as its name suggests, here, the first character that we encounter which does not match the pattern we are searching for, is called the Bad Character. On encountering the bad character, we:

Case I: Shift the pattern till the mismatch becomes a match

As mentioned earlier, we start checking the pattern occurrence from the rightmost character, so in this case, we'll find the bad character. Now, we'll shift the pattern in such a way that it gets aligned with the mismatched character in the text. For more clarity, you can see the illustration: Explanation:  As we have already stated that in this algorithm, the matching begins from the rightmost character. So, we start matching the text with the pattern. On matching the first character of the pattern we observe that 'C' does not match with 'A'. So, we try to match the 'A' of the given text with 'A'(if present) in the pattern. So, we shift the pattern by 2 so that the rightmost character of both strings matches. Thereafter, we match the other 2 characters.

Case II: Shift the pattern till it moves past the bad character

In this case, we'll begin matching the pattern from the right end, and see if it matches one by one with all the characters of the given string. If at any time, we observe that the rightmost character is not matching with the rightmost character of the given string, we shift the whole pattern ahead of the last character of the string with which we were trying to match the pattern. Have a look at the below image: Explanation: Now, in this case, observe that we have a mismatch at character 'C'. Now, after going through the whole pattern, we observe that it does not contain any 'C'. So, we take the whole pattern and move it right across the letter 'C' and later, check for it to match.

### Implementation of Bad Character Heuristics (Boyer-Moore) in C++

```#include <bits/stdc++.h>
using namespace std;
# define NO_OF_CHARS 256

{
int i;
// Initializing all occurrences as -1
for (i = 0; i < NO_OF_CHARS; i++)

// Fill the actual value of last occurrence
// of a character
for (i = 0; i < size; i++)
}

void search( string txt, string pat)
{
int m = pat.size();
int n = txt.size();

/* Fill the bad character array by calling
for given pattern */

int s = 0;
while(s <= (n - m))
{
int j = m - 1;        /* Keep reducing index j of pattern while
characters of pattern and text are
matching at this shift s */
while(j >= 0 && pat[j] == txt[s + j])
j--;

/* If the pattern is present at current
shift, then index j will become -1 after
the above loop */
if (j < 0)
{
cout << "pattern occurs at shift = " << s << endl;
s += (s + m < n)? m-badchar[txt[s + m]] : 1;
}
else
s += max(1, j - badchar[txt[s + j]]);
}
}
/* Driver code */
int main()
{
string txt= "ABAAABCD";
string pat = "ABC";
search(txt, pat);
return 0;
}
```

### Implementation of Bad Character Heuristics (Boyer-Moore) in Python

```NO_OF_CHARS = 256

# Initialize all occurrence as -1

# Fill the actual value of last occurrence
for i in range(size):

# return initialized list

def search(txt, pat):
m = len(pat)
n = len(txt)

# create the bad character list by calling
# for given pattern

# s is shift of the pattern with respect to text
s = 0
while(s <= n-m):
j = m-1

# Keep reducing index j of pattern while
# characters of pattern and text are matching
# at this shift s
while j>=0 and pat[j] == txt[s+j]:
j -= 1

# If the pattern is present at current shift,
# then index j will become -1 after the above loop
if j<0:
print("Pattern occur at shift = {}".format(s))
s += (m-badChar[ord(txt[s+m])] if s+melse 1)
else:
def main():
txt = "ABAAABCD"
pat = "ABC"
search(txt, pat)

if __name__ == '__main__':
main()
```

### Implementation of Bad Character Heuristics(Boyer-Moore) in Java

```class Solution{

static int NO_OF_CHARS = 256;
static int max (int a, int b) { return (a > b)? a: b; }      //returns max of two numbers
{

// Initialize all occurrences as -1
for (int i = 0; i < NO_OF_CHARS; i++)

// Fill the actual value of last occurrence of a character
for (int i = 0; i < size; i++)
}

static void search( char txt[], char pat[])
{
int m = pat.length;
int n = txt.length;

int s = 0;
while(s <= (n - m))
{
int j = m-1;

/* Keep reducing index j of pattern while
characters of pattern and text are
matching at this shift s */
while(j >= 0 && pat[j] == txt[s+j])
j--;

/* If the pattern is present at current
shift, then index j will become -1 after
the above loop */
if (j < 0)
{
System.out.println("Patterns occur at shift = " + s);
s += (s+m < n)? m-badchar[txt[s+m]] : 1;
}
else
s += max(1, j - badchar[txt[s+j]]);
}
}

public static void main(String []args) {

char txt[] = "ABAAABCD".toCharArray();
char pat[] = "ABC".toCharArray();
search(txt, pat);
}
}
```

Output:

```Pattern occurs at 4
```

## 02) Good Suffix Heuristics

Just like in the Bad Character approach preprocessing is done, in this strategy also, preprocessing is done based on certain scenarios. Here, we have 3 different cases that will tell which step we need to follow next to match the given pattern efficiently:

Case I: Shift the pattern until the next occurrence is found

Here, when we see that pattern is not matching with the given string, we shift the pattern to the right side to check whether that pattern is present in the further characters of the string. See the given image: Case II: Shift pattern till prefix of pattern matches with a suffix of text

In this case, if we're not able to find the pattern in one go, we try to match the prefix of the pattern with a possible suffix of the given text, by shifting the pattern that many times as the length of the prefix. Have a look at the given image: Case III: Shift pattern till the pattern moves past the provided text

This happens when there is no possibility of the pattern matching with the string. In this case, we just shift the pattern till the end. Look at the given illustration: ### Implementation of Good Suffix Heuristics(Boyer-Moore) with C++

```/* C program for Boyer Moore Algorithm with
Good Suffix heuristic to find pattern in
given text string */

#include
#include

// preprocessing for strong good suffix rule
void preprocess_strong_suffix(int *shift, int *bpos,
char *pat, int m)
{
// m is the length of pattern
int i=m, j=m+1;
bpos[i]=j;

while(i>0)
{
/*if character at position i-1 is not equivalent to
character at j-1, then continue searching to right
of the pattern for border */
while(j<=m && pat[i-1] != pat[j-1])
{
/* the character preceding the occurrence of t in
pattern P is different than the mismatching character in P,
we stop skipping the occurrences and shift the pattern
from i to j */
if (shift[j]==0)
shift[j] = j-i;

//Update the position of next border
j = bpos[j];
}
/* p[i-1] matched with p[j-1], border is found.
store the beginning position of border */
i--;j--;
bpos[i] = j;
}
}

//Preprocessing for case 2
void preprocess_case2(int *shift, int *bpos,
char *pat, int m)
{
int i, j;
j = bpos;
for(i=0; i<=m; i++)
{
/* set the border position of the first character of the pattern
to all indices in array shift having shift[i] = 0 */
if(shift[i]==0)
shift[i] = j;

/* suffix becomes shorter than bpos, use the position of
next widest border as value of j */
if (i==j)
j = bpos[j];
}
}

/*Search for a pattern in given text using
Boyer Moore algorithm with Good suffix rule */
void search(char *text, char *pat)
{
// s is shift of the pattern with respect to text
int s=0, j;
int m = strlen(pat);
int n = strlen(text);

int bpos[m+1], shift[m+1];

//initialize all occurrence of shift to 0
for(int i=0;i<m+1;i++) shift[i]=0;

//do preprocessing
preprocess_strong_suffix(shift, bpos, pat, m);
preprocess_case2(shift, bpos, pat, m);

while(s <= n-m)
{

j = m-1;

/* Keep reducing index j of pattern while characters of
pattern and text are matching at this shift s*/
while(j >= 0 && pat[j] == text[s+j])
j--;

/* If the pattern is present at the current shift, then index j
will become -1 after the above loop */
if (j<0)
{
printf("pattern occurs at shift = %d\n", s);
s += shift;
}
else
/*pat[i] != pat[s+j] so shift the pattern
shift[j+1] times */
s += shift[j+1];
}

}

//Driver
int main()
{
char text[] = "ABAAAABAACD";
char pat[] = "ABA";
search(text, pat);
return 0;
}
```

### Implementation of Good Suffix Heuristics(Boyer-Moore) with Python

```# Python3 program for Boyer Moore Algorithm with
# Good Suffix heuristic to find pattern in
# given text string

# preprocessing for strong good suffix rule
def preprocess_strong_suffix(shift, bpos, pat, m):

# m is the length of pattern
i = m
j = m + 1
bpos[i] = j

while i > 0:

'''if character at position i-1 is
not equivalent to character at j-1,
then continue searching to right
of the pattern for border '''
while j <= m and pat[i - 1] != pat[j - 1]:

''' the character preceding the occurrence
of t in pattern P is different than the
mismatching character in P, we stop skipping
the occurrences and shift the pattern
from i to j '''
if shift[j] == 0:
shift[j] = j - i

# Update the position of next border
j = bpos[j]

''' p[i-1] matched with p[j-1], border is found.
store the beginning position of border '''
i -= 1
j -= 1
bpos[i] = j

# Preprocessing for case 2
def preprocess_case2(shift, bpos, pat, m):
j = bpos
for i in range(m + 1):

''' set the border position of the first character
of the pattern to all indices in array shift
having shift[i] = 0 '''
if shift[i] == 0:
shift[i] = j

''' suffix becomes shorter than bpos,
use the position of next widest border
as value of j '''
if i == j:
j = bpos[j]

'''Search for a pattern in given text using
Boyer Moore algorithm with Good suffix rule '''
def search(text, pat):

# s is shift of the pattern with respect to text
s = 0
m = len(pat)
n = len(text)

bpos =  * (m + 1)

# initialize all occurrence of shift to 0
shift =  * (m + 1)

# do preprocessing
preprocess_strong_suffix(shift, bpos, pat, m)
preprocess_case2(shift, bpos, pat, m)

while s <= n - m:
j = m - 1

''' Keep reducing index j of pattern while characters of
pattern and text are matching at this shift s'''
while j >= 0 and pat[j] == text[s + j]:
j -= 1

''' If the pattern is present at the current shift,
then index j will become -1 after the above loop '''
if j < 0:
print("pattern occurs at shift = %d" % s)
s += shift
else:

'''pat[i] != pat[s+j] so shift the pattern
shift[j+1] times '''
s += shift[j + 1]

# Driver Code
if __name__ == "__main__":
text = "ABAAAABAACD"
pat = "ABA"
search(text, pat)

```

### Implementation of Good Suffix Heuristics(Boyer-Moore) with Java

```/* Java program for Boyer Moore Algorithm with
Good Suffix heuristic to find pattern in
given text string */
class GFG
{

// preprocessing for strong good suffix rule
static void preprocess_strong_suffix(int []shift, int []bpos,
char []pat, int m)
{
// m is the length of pattern
int i = m, j = m + 1;
bpos[i] = j;

while(i > 0)
{
/*if character at position i-1 is not
equivalent to character at j-1, then
continue searching to right of the
pattern for border */
while(j <= m && pat[i - 1] != pat[j - 1])
{
/* the character preceding the occurrence of t
in pattern P is different than the mismatching
character in P, we stop skipping the occurrences
and shift the pattern from i to j */
if (shift[j] == 0)
shift[j] = j - i;

//Update the position of next border
j = bpos[j];
}
/* p[i-1] matched with p[j-1], border is found.
store the beginning position of border */
i--; j--;
bpos[i] = j;
}
}

//Preprocessing for case 2
static void preprocess_case2(int []shift, int []bpos,
char []pat, int m)
{
int i, j;
j = bpos;
for(i = 0; i <= m; i++)
{
/* set the border position of the first character
of the pattern to all indices in array shift
having shift[i] = 0 */
if(shift[i] == 0)
shift[i] = j;

/* suffix becomes shorter than bpos,
use the position of next widest border
as value of j */
if (i == j)
j = bpos[j];
}
}

/*Search for a pattern in given text using
Boyer Moore algorithm with Good suffix rule */
static void search(char []text, char []pat)
{
// s is shift of the pattern
// with respect to text
int s = 0, j;
int m = pat.length;
int n = text.length;

int []bpos = new int[m + 1];
int []shift = new int[m + 1];

//initialize all occurrence of shift to 0
for(int i = 0; i < m + 1; i++)
shift[i] = 0;

//do preprocessing
preprocess_strong_suffix(shift, bpos, pat, m);
preprocess_case2(shift, bpos, pat, m);

while(s <= n - m)
{
j = m - 1;

/* Keep reducing index j of pattern while
characters of pattern and text are matching
at this shift s*/
while(j >= 0 && pat[j] == text[s+j])
j--;

/* If the pattern is present at the current shift,
then index j will become -1 after the above loop */
if (j < 0)
{
System.out.printf("pattern occurs at shift = %d\n", s);
s += shift;
}
else

/*pat[i] != pat[s+j] so shift the pattern
shift[j+1] times */
s += shift[j + 1];
}

}

// Driver Code
public static void main(String[] args)
{
char []text = "ABAAAABAACD".toCharArray();
char []pat = "ABA".toCharArray();
search(text, pat);
}
}
```

Output:

```pattern occurs at shift = 0
pattern occurs at shift = 5
```

### Time Complexity of Boyer-Moore Algorithm

The time taken by the Boyer-Moore Algorithm during pattern matching in the worst case is O(m*n). Here, m is the length of the pattern and n is the length of the string provided.

### Comparison of the Boyer-Moore Algorithm with other alternatives

 Algorithm Preprocessing Time Matching Time Naive Approach O (O (n - m + 1)m) Rabin-Karp O(m) (O (n - m + 1)m) Finite Automata O(m|∑|) O (n) Knuth-Morris-Pratt O(m) O (n) Boyer-Moore O(|∑|) (O ((n - m + 1) + |∑|))

### Applications of the Boyer-Moore Algorithm

1. This algorithm is quite efficient when a large number of characters are present both in the string as well as in the pattern provided.
2. Used in Network Intrusion Detection Systems to find malicious patterns for cyber safety.

You should also learn about Aho-Corasick algorithm for pattern searching as well.

## Conclusion

So, in today's blog, we discussed the Boyer-Moore Algorithm and where it can be implemented. We also understood its working via the two strategies along with its comparison with other popular alternatives. At FavTutor, we bring to you the most complex topics in the easiest form.

### FavTutor - 24x7 Live Coding Help from Expert Tutors! 