To become a good programmer, you have to solve a lot of programming problems first to master this skill. One such common problem is LCS, which is asked in many coding interviews to test the candidate. In this article, we will discuss approaches to finding the longest common subsequence using recursive and dynamic programming approaches.
What is the Longest Common Subsequence Problem?
In the context of strings, a subsequence is a sequence of characters that appears in the same order as they appear in the original string, but not necessarily consecutively. The longest common subsequence as its name itself tells that, we have to tell the longest length of the common subsequence in two strings.
In this Longest Common Subsequence problem, we are given two strings s1 and s2 and we have to tell the longest length of the common subsequence between them.
Let us understand this problem with an example:
s1 = “favtutor”
s2 = “actor”
The longest common subsequence is “ator”, with a length of 4.
Brute Force Approach: Using Recursion
One way to solve the longest common subsequence problem is through a recursive approach. We can generate all possible subsequences and find the longest one that is present in both strings.
The recursive approach follows these steps:
- Firstly we will declare a base case If either of the input sequences is empty, the length of the LCS is 0.
- Now we will check the last characters of the sequences match, the LCS length is one plus the LCS length of the remaining sequences (excluding the last characters).
- If the last characters do not match, we have two options
- Exclude the last character of the first sequence and find the LCS for the remaining sequences.
- Exclude the last character of the second sequence and find the LCS for the remaining sequences.
- From the above 2 steps, we will take maximum LCS from both steps.
The time complexity of the Longest Common Subsequence (LCS) problem using recursion is O(2m*n) where m and n are lengths of the string s1 and s2 respectively.
Optimized Approach: Using Dynamic Programming
While the recursive approach provides a solution, it can be inefficient due to the overlapping subproblems. To optimize the solution, we can use the concept memoization of Dynamic Programming. Memoization involves storing the results of previously solved subproblems in a table to avoid redundant calculations and helps to optimize the solution.
Steps used to memoize a recursive code:
- Create a dp (dynamic programming) array of size [n][m], where n and m are the lengths of S1 and S2, respectively. This array will store the result of every subproblem until it does not occur again.
- We will initialize the dp array with -1 because none of the subproblems is solved in the beginning.
- we will find the answer for particular indices (e.g., LCS (ind1, ind2)), and if s1[i] == s2[j] we will do i = i-1, j = j-1.
- s1[i]==s2[j] till “tor” then we will check for the rest of the string.
- We first check whether the answer has already been calculated using the dp array (i.e., dp[ind1][ind2] != -1). If it has, we simply return the value from the dp[ind1][ind2].
- Here we got an overlapping subproblem in this example:
- If the answer has not been calculated yet, we will use the recursive relation as usual. However, before returning from the function, we set dp[ind1][ind2] to the solution we obtained.
C++ Code
Below is the C++ program to find the longest common subsequence of two strings:
#include<bits/stdc++.h> using namespace std; int func(int i, int j, string &s1, string &s2, vector<vector<int>> &dp) { if (i == 0 || j == 0) return 0; if (dp[i][j] != -1) return dp[i][j]; if (s1[i-1] == s2[j-1]) { return dp[i][j] = 1 + func(i-1, j-1, s1, s2, dp); } return dp[i][j] = max(func(i -1, j, s1, s2, dp), func(i, j - 1, s1, s2, dp)); } int longestCommonSubsequence(string s1, string s2) { int i = s1.size(); int j = s2.size(); vector<vector<int>> dp(i+1, vector<int>(j+1, -1)); return func(i, j, s1, s2, dp); } int main() { string s1 = "favtutor"; string s2 = "actor"; int result = longestCommonSubsequence(s1, s2); cout << "Length of Longest Common Subsequence: " << result << endl; return 0; }
Python Code
Below is the Python program to find the LCS:
def lcs(i, j, dp, s1, s2): # Base case: if either of the strings is empty if i == 0 or j == 0: return 0 # Check if the result is already in the dp table if dp[i][j] != -1: return dp[i][j] # If the last characters match if s1[i - 1] == s2[j - 1]: dp[i][j] = 1 + lcs(i - 1, j - 1, dp, s1, s2) return dp[i][j] else: # If the last characters do not match dp[i][j] = max(lcs(i - 1, j, dp, s1, s2), lcs(i, j - 1, dp, s1, s2)) return dp[i][j] def lcs_dp(s1, s2): # Helper function for dynamic programming LCS # Initialize dp table with -1 m, n = len(s1), len(s2) dp = [[-1] * (n + 1) for _ in range(m + 1)] # Call the helper function return lcs(m, n, dp, s1, s2) # Example usage: s1 = "favtutor" s2 = "actor" result = lcs_dp(s1, s2) print("Length of LCS:", result)
Java Code
Here is the Java Program for this problem as well:
import java.util.Arrays; public class LCSDynamicProgramming { public static int lcs(int i, int j, int[][] dp, String s1, String s2) { // Base case: if either of the strings is empty if (i == 0 || j == 0) { return 0; } // Check if the result is already in the dp table if (dp[i][j] != -1) { return dp[i][j]; } // If the last characters match if (s1.charAt(i - 1) == s2.charAt(j - 1)) { dp[i][j] = 1 + lcs(i - 1, j - 1, dp, s1, s2); return dp[i][j]; } else { // If the last characters do not match dp[i][j] = Math.max(lcs(i - 1, j, dp, s1, s2), lcs(i, j - 1, dp, s1, s2)); return dp[i][j]; } } public static int lcs_dp(String s1, String s2) { // Helper function for dynamic programming LCS // Initialize dp table with -1 int m = s1.length(); int n = s2.length(); int[][] dp = new int[m + 1][n + 1]; for (int[] row : dp) { Arrays.fill(row, -1); } // Call the helper function return lcs(m, n, dp, s1, s2); } public static void main(String[] args) { String s1 = "favtutor"; String s2 = "actor"; int result = lcs_dp(s1, s2); System.out.println("Length of LCS: " + result); } }
Output:
Length of LCS: 4
The time complexity is determined by the number of subproblems that need to be solved. In this case, there are O(m*n) subproblems, where m is the length of the first string (s1) and n is the length of the second string (s2).
The space complexity is determined by the memory required for the memoization table (dp). The table is a 2D array of size (m+1)*(n+1), where m is the length of s1 and n is the length of s2. So overall space complexity is O(m*n).
Analysis of Both Approaches
Approach | Time Complexity | Space Complexity | Details |
Recursive Approach | O(2m*n) | O(1) | The time complexity is exponential because, in the worst case, the function explores all possible combinations. |
Dynamic Programming Approach | O(m*n) | O(m*n) | The time complexity is reduced using memoization and extra space is used to store the results of overlapping subproblems |
The recursive approach is characterized by its exponential time complexity, O(2m*n), where m and n are the lengths of the input strings. To optimize the solution, we have used a dynamic programming approach with memoization. The dynamic programming approach significantly reduces the time complexity to O(m*n).
Conclusion
This article explores the Longest Common Subsequence (LCS) problem from leetcode, which involves finding the length of the longest common subsequence between two given strings.