Given two strings a and b, find whether any anagram of string a is a sub-string of string b

I know this is a fairly (old and) simple question. But what people do not realize is that it has a catch! I saw this question here and browsed through most of the solutions. Almost all of them (except the crazy ones where folks wanted to do brutal force search) would fail on a very simple case. And this question, is all about just that case

#include<iostream>
#include<string>
#include<vector>
using namespace std;
int main() {
  string src, tgt;
  std::getline(cin, src);
  std::getline(cin, tgt);
  vector<int> buf(256, 0);
  const int tot = src.size();
  for (int i = 0; i < src.size(); ++i) {
    buf[src[i]]++;
  }
  vector<int> tmp(256, 0);
  int start = -1, tmp_tot = 0, end = 0;
  for (int i = 0; i < tgt.size(); ++i) {
    const unsigned char ch = tgt.at(i);
    if (buf[ch] > 0) {
      if (tmp[ch] == buf[ch]) {
        if (tmp_tot > 256) {
          tmp.clear();
          tmp.resize(256, 0);
        } else {
          while (tmp_tot > 0) tmp[tgt[i - tmp_tot--]]--;
        }
        start = i;
        tmp[ch]++;
        tmp_tot = 1;
      } else {
        if (start == -1) start = i;
        tmp[ch]++;
        ++tmp_tot;
        if (tmp_tot == tot) end = i;
      }
    }
  }
  if (end > 0) {
    cout << start << "    " << end << std::endl;
  } else {
    cout << "Not found\n";
  }
  return 0;
}

Knuth Morrison Pratt (KMP) String Matching Algorithm

I have always wanted to understand the KMP Algorithm and write a program for it and finally I got around doing it. I had this big misconception that its a very tedious algorithm and does some black magic but, as with all other Knuth’s algorithms, this is a beauty 🙂

Following is a working 😀 program, which does not handle corner cases, empty strings and UTF-8 etc. But works for sane inputs.

#include <iostream>
#include <string>
#include <cstring>
#include <vector>
using namespace std;

vector<int> BuildLookupTable(const string& needle) {
  vector<int> ret_val;
  ret_val.resize(needle.length());
  size_t pos = 2, prefix_pos = 0;
  ret_val[0] = -1;
  ret_val[1] = 0;
  while (pos < needle.length()) {
    if (needle.at(pos - 1) == needle.at(prefix_pos)) {
      ret_val[pos++] = ++prefix_pos;
    } else if (prefix_pos > 0) {
      prefix_pos = ret_val[prefix_pos];
    } else {
      ret_val[pos++] = 0;
    }
  }
  return ret_val;
}

string FindNeedleInHaystackUsingKMP(const string& needle,
                                    const string& haystack) {
  vector<int> lookup_table = BuildLookupTable(needle);
  size_t haystack_index = 0;
  size_t needle_index = 0;
  while (haystack_index + needle_index < haystack.length()) {
    if (haystack.at(haystack_index + needle_index)
        == needle.at(needle_index)) {
      ++needle_index;
      if (needle_index == needle.length()) return "Found.";
    } else {
      haystack_index = haystack_index + needle_index
          - lookup_table[needle_index];
      if (lookup_table[needle_index] == -1) {
        needle_index = 0;
      } else {
        needle_index = lookup_table[needle_index];
      }
    }
  }
  return "Not Found.";
}

int main() {
  string needle, haystack;
  cout << "Enter the needle: ";
  getline(std::cin, needle);
  cout << "Enter the haystack: ";
  getline(std::cin, haystack);
  cout << FindNeedleInHaystackUsingKMP(needle, haystack) << "\n";
  char* ret_val = strstr(haystack.c_str(), needle.c_str());
  if (ret_val == NULL) {
    cout << "(Verifier) : Not Found\n";
  } else {
    cout << "(Verifier) : Found\n";
  }
}

Rabin Karp String Search Algorithm

I came across the Rabin–Karp string search algorithm when I was looking for a method to implement strstr as a C program. The obvious brute-force method was a waste, so I found the Rabin-Karp, Knuth-Morris-Pratt Algorithm and Boyer-Moore Algorithm . The KMP is hectic to understand and code as well but Boyer-Moore is very practical and can be coded quickly. I had less time so coded only the Rabin-Karp Algorithm.
The code which I had made.

#include<iostream>
#include<string>
#include<vector>
#include<cmath>

using namespace std;

int main()
{
	string T, P;
	cin >> T;
	cin >> P;
	long long int PRIME = atoll("10002949999");
	int l_T = T.length(), l_P = P.length(), t=0, h = 1, d = 26, H = 0;
	for (int i=1; i < l_P; i++)
		h = (h * d) % PRIME;
	for (int i=0;i<l_P;i++)
	{
		H = ( (d*H)  + P[i] - 'a')%PRIME ;
		t = ( (d*t)  + T[i] - 'a')%PRIME ;
	}
	for (int i = 0, j = l_T - l_P - 1 ; i < j ; i++)
	{
		if (t == H)
		{
			int state = 1;
			for (int k = 0 ;k < l_P; k++)
				if (T[i+k] != P[k])
				{
					state = -1;
					break;
				}
			if (state == 1 )
			{
				cout << " Match found at index : " << i << "\n";
			}
		}
		else
		{
			t =  ( (d * (t - h*(T[i]-'a')) + T[i+l_P] - 'a')%PRIME );
		}
	}
	return 0;
}

The algorithm can be found on wiki, I had used the pseudo-code from CLRS.

%d bloggers like this: