Monday, June 14, 2010

DPBO – Programming Final Exam Solutions: Longest Common Substring (Problem A1)

This is the first post in the series of posts in which I am going to provide some solutions to the final exam of DPBO course this year. If I can, I will first provide the problem question and then proceed with the solution. I will try to provide as accurate English translation of the problem as possible, though I will sometimes modify the wordings slightly to make it clearer to the reader.

For the first problem, we have Problem A1 as follows:
"Build a program that accepts two words (strings) as command-line parameters (no validation is necessary) and then finds the longest common substring of both string inputs. Your program should print out the length of a longest common substring and additionally, the corresponding longest common substring if its length is not zero. If there are more than one of such substrings, you need only to print out one of them. For example, strings "andalan" and "teladan" have two longest common substrings, namely "an" and "da", but your program only needs to print out one of them."
Examples of command-line inputs:
[sourcecode gutter="false" toolbar="false"]
> java LongestCommonSubstringFinder jembatan pendekatan
4 (atan)
> java LongestCommonSubstringFinder andalan teladan
2 (an)
> java LongestCommonSubstringFinder abcd efgh
0
> java LongestCommonSubstringFinder semboyan pembobolan
4 (embo)
[/sourcecode]
The following program solves the above problem. The explanation is given as program comments. Notice the use of prefix and suffix notions to get to the solution. In general, if the length of inputs are m and n respectively, then there can be O(mn) pairs of prefixes to be compared and the comparison itself takes at most O(max(m,n)) iterations.
[sourcecode language="java" light="true" wraplines="false"]
/**
* This class is a solution for problem A1 in programming exam of DPBO.
* The problem asks us to find the longest common substring of two strings given as command-line arguments.
*/
public class LongestCommonSubstringFinder {
public static void main(String args[]) {
jawab(args[0],args[1]);
}
/**
* This method computes and prints the longest common substring of the given two strings.
*
* In order to find it, note that if s is the longest common substring of a and b,
* then it must hold that a = wsx and b = ysz, where w, x, y, and z are (possibly null) strings.
* For example, if a = "semboyan" and b = "pembobolan", then w = "s", x = "yan", y = "p", z = "bolan",
* and a longest common substring is s = "embo". Another example, if a = "jembatan" and b "pendekatan",
* then w = "jemb", x = "", y = "pendek", z = "" and a longest common substring is s = "atan".
*
* Our solution depends on the notion of prefix and suffix of a string.
* A prefix of a string is its substrings whose first character (if any) is the first character of the original string.
* For example, "je" and "jemb" are two of several prefixes of "jembatan".
* A suffix of a string is its substrings whose last character (if any) is the last character of the original string.
* For example, "an" and "atan" are two of several suffixes of "jembatan".
*
* To find a longest common substring of a and b, we are looking at every possible pair of prefixes,
* each of which is respectively formed from a and b.
* A longest common substring of a and b can then be found by finding the longest common suffixes for each of
* those pair of prefixes.
*
*/
public static void jawab(String a, String b) {
// Our solution makes use of a 2-dimensional array of strings as the data structure
// The (i,j)-th element of the 2-dimensional array is the longest common suffix of
// the prefix of string a whose length is i and prefix of string b whose length is j.
// We use two additional arrays to ease the association between the indices and
// prefixes of string a and b respectively.
// Note that if the string a is of length n, then there are n+1 prefixes of string a,
// starting with the empty string as the prefix of length 0, until the string a itself
// as the prefix of length n.
// In the following two arrays of prefix, we store prefixes of string a and b, starting from
// the prefix of length 0 to the longest prefix.
String[] prefixesOfA = new String[1+a.length()];
String[] prefixesOfB = new String[1+b.length()];
for (int i=0; i < 1+a.length(); i++) prefixesOfA[i] = a.substring(0,i);
for (int i=0; i < 1+b.length(); i++) prefixesOfB[i] = b.substring(0,i);
//The following variable is the 2-dimensional array mentioned above.
String[][] suffixes = new String[1+a.length()][];
// We store the longest common substring in the following variable.
String longestCommonSubstring = "";
// We loop on the 2-dimensional array, allocating the memory on the fly.
// The (i,j)-th element is filled with the longest common suffix between
// prefix of a with length i and prefix of b with length j.
// If the newly computed suffix is longer than the current longest common substring,
// then the longest common substring is appropriately updated with the new suffix.
// The longest common suffix can be found simply by looping from the end of both
// strings, taking each character as long as the character is in both strings.
for (int i=0; i < prefixesOfA.length; i++) {
suffixes[i] = new String[1+b.length()];
for (int j=0; j<prefixesOfB.length; j++) {
suffixes[i][j] = "";
for (int k1 = prefixesOfA[i].length()-1, k2 = prefixesOfB[j].length()-1; k1>=0 && k2 >=0; k1--, k2--) {
if ( prefixesOfA[i].charAt(k1) != prefixesOfB[j].charAt(k2) ) break;
else suffixes[i][j] = String.valueOf(prefixesOfA[i].charAt(k1)) + suffixes[i][j];
}
if (suffixes[i][j].length() > longestCommonSubstring.length())
longestCommonSubstring = suffixes[i][j];
}
}
// prints out the length of longest common substring and
// prints out the longest common substring if its length is more than 0
System.out.print(longestCommonSubstring.length());
if (longestCommonSubstring.length() > 0)
System.out.print(" (" + longestCommonSubstring + ")");
System.out.println();
}
}
[/sourcecode]

No comments:

Post a Comment