String Compression Problem (C++, Java, Python)

The String Compression problem is one of the famous leetcode problems in Data structure and algorithms that involves finding and returning the length of a new string formed after compressing the original one. In this article, we will understand the String Compression problem and explore the approach to solve it by implementing code in C++, Java, and Python.

What is a String Compression Problem?

In the string compression problem, we are given with vector of characters and we have to compress those according to some rules and return the length of a new string thus formed.

Those rules are: Begin with an empty string s. For each group of consecutive repeating characters in chars: If the group’s length is 1, append the character to s. Otherwise, append the character followed by the group’s length.

Let us discuss this problem with an example:

chars = [“a”,“b”,“b”,“c”,“c”,“c”]
Output = 5

Naive Approach

The simplest way to solve the String Compression problem is by using two pointers. One pointer for iterating through the original character array and one for keeping track of the current position in the compressed array. We will also use a variable to keep track of the count of consecutive characters.

Below are the steps to be followed:

First, we will set the current letter to the first character in the array and initialize the count to 0.
Then we will iterate through the array until you find a different character or reach the end of the array.
For each iteration, increment the count and the index i.
When we find a different character or reach the end of the array, write the current letter to the compressed array and, if the count is greater than 1, write the count as a string to the compressed array.
Then reset the count to 0 and set the current letter to the new letter.
Finally, we return the length of the compressed array, which is equal to the position of the last character in the compressed array plus one.

The time complexity for the naive approach is O(n) because we are iterating through the whole chars vector only once. The space complexity for this approach is O(n).

Optimized Approach

In this approach, we will use the same approach used above but instead of storing the compressed string separately we will store its length and update the input vector given to us. Thus to store the length of the compressed string we will use another ans variable.

Different character and count = 1 so we will do ans += 1 and get ans =1;

Different character and count = 2 so we will do ans += 2 and get ans = 3;

j goes out of bound and the count is equal to 3 so ans += 2 and we will get ans = 5

C++ Code

Here is the C++ program to solve String Compression problem:

#include <iostream>
#include <vector>
#include <string>

using namespace std; // Adding the std namespace

// Function to compress a vector of characters in-place
int compress(vector<char>& chars) {
    int ans = 0; // Variable to keep track of the compressed vector length

    // Iterate through input vector using i pointer
    for (int i = 0; i < chars.size();) {
        const char letter = chars[i]; // Current character being compressed
        int count = 0; // Count of consecutive occurrences of the letter

        // Count consecutive occurrences of the letter in the input vector
        while (i < chars.size() && chars[i] == letter) {
            ++count;
            ++i;
        }

        // Write the letter to the compressed vector
        chars[ans++] = letter;

        // If count is greater than 1, write count as a string to the compressed vector
        if (count > 1) {
            // Convert count to a string and iterate over each character in the string
            for (const char c : to_string(count)) {
                chars[ans++] = c;
            }
        }
    }

    // Return the length of the compressed vector
    return ans;
}

int main() {
    // Example vector
    vector<char> inputVector = {'a', 'b', 'b', 'c', 'c', 'c'};

    // Call the compress function
    int compressedLength = compress(inputVector);

    // Display the compressed vector and its length
    cout << "Compressed Vector: ";
    for (int i = 0; i < compressedLength; ++i) {
        cout << inputVector[i] << " ";
    }
    cout << "\nCompressed Vector Length: " << compressedLength << endl;

    return 0;
}

Java Code

Below is the Java code to solve this problem:

import java.util.ArrayList;
import java.util.List;

public class Solution {
    public static int compress(List<Character> chars) {
        int ans = 0;

        // Iterate through input list using i pointer
        for (int i = 0; i < chars.size();) {
            char letter = chars.get(i); // Current character being compressed
            int count = 0; // Count of consecutive occurrences of letter

            // Count consecutive occurrences of letter in input list
            while (i < chars.size() && chars.get(i) == letter) {
                ++count;
                ++i;
            }

            // Write letter to compressed list
            chars.set(ans++, letter);

            // If count is greater than 1, write count as string to compressed list
            if (count > 1) {
                // Convert count to string and iterate over each character in string
                for (char c : Integer.toString(count).toCharArray()) {
                    chars.set(ans++, c);
                }
            }
        }

        // Return length of compressed list
        return ans;
    }

    public static void main(String[] args) {
        // Example list
        List<Character> inputList = new ArrayList<>();
        inputList.add('a');
        inputList.add('b');
        inputList.add('b');
        inputList.add('c');
        inputList.add('c');
        inputList.add('c');

        // Call the compress function
        int compressedLength = compress(inputList);

        // Display the compressed list and its length
        System.out.print("Compressed List: ");
        for (int i = 0; i < compressedLength; ++i) {
            System.out.print(inputList.get(i) + " ");
        }
        System.out.println("\nCompressed List Length: " + compressedLength);
    }
}

Python Code

You can implement it in Python as well:

def compress(chars):
    ans = 0

    # Iterate through input list using i pointer
    i = 0
    while i < len(chars):
        letter = chars[i]  # Current character being compressed
        count = 0  # Count of consecutive occurrences of letter

        # Count consecutive occurrences of letter in input list
        while i < len(chars) and chars[i] == letter:
            count += 1
            i += 1

        # Write letter to compressed list
        chars[ans] = letter
        ans += 1

        # If count is greater than 1, write count as string to compressed list
        if count > 1:
            # Convert count to string and iterate over each character in string
            for c in str(count):
                chars[ans] = c
                ans += 1

    # Return length of compressed list
    return ans

# Example list
input_list = ['a', 'b', 'b', 'c', 'c', 'c']

# Call the compress function
compressed_length = compress(input_list)

# Display the compressed list and its length
print("Compressed List:", end=" ")
for i in range(compressed_length):
    print(input_list[i], end=" ")
print("\nCompressed List Length:", compressed_length)

Output:

Compressed Vector: a b 2 c 3 
Compressed Vector Length: 5

The Time complexity for the approach is O(n) because we are iterating through the vector only once. The Space complexity for this approach is O(1) because no extra space is used to store compressed strings.

The use of carefully managed pointers minimizes time complexity to O(n) while maintaining constant space complexity. This approach is recommended for larger datasets where performance is crucial.

Finally, check the comparison of both of them:

Approaches	Time Complexity	Space Complexity	Details
Naive Approach:Iterating over array	O(n)	O(n)	We traverse over the whole string and consider groups of repeated characters and based on their length we form a compressed string and at last return it’s length.
Optimized Approach	O(n)	O(1)	Here instead of storing the whole compressed string we store only its length and update the input vector given to us.

Conclusion

In conclusion, the optimized approach given above to solve the String Compression leetcode problem stands out as the most efficient solution for the string compression problem, providing a balance between simplicity and performance. There are so many such interesting problems that you need to master, which you can do by joining our Data Structures Bootcamp to learn it thoroughly.