feat(algorithms, dp, memoization): word break dp with memoization

BrianLusina · BrianLusina · commit 7c0698af7577 · 2025-12-17T10:49:29.000+03:00
diff --git a/algorithms/dynamic_programming/word_break/README.md b/algorithms/dynamic_programming/word_break/README.md
@@ -233,4 +233,173 @@ the `dp` array.
 
 ### Dynamic Programming - Memoization
 
+We can improve the efficiency of the backtracking method by using Memoization, which stores the results of subproblems 
+to avoid recalculating them.
+
+We use a depth-first search (DFS) function that recursively breaks the string into words. However, before performing a 
+recursive call, we check if the results for the current substring have already been computed and stored in a memoization
+map (typically a dictionary or hash table).
+
+If the results of the current substring are found in the memoization map, we can directly return them without further 
+computation. If not, we proceed with the recursive call, computing the results and storing them in the memoization map 
+before returning them.
+
+By memoizing the results, we can reduce the number of computations by ensuring that each substring is processed only 
+once in average cases.
+
+#### Algorithm
+
+1. Convert the `wordDict` array into an unordered set `wordSet` for efficient lookups.
+2. Initialize an empty unordered map `memoization` to store the results of subproblems.
+3. Call the `dfs` function with the input string s, wordSet, and memoization. 
+   - Check if the answer for the current `remainingSt`r(the remaining part of the string to be processed) are already 
+     in `memoization`. If so, return them.
+   - Base Case: If `remainingStr` is empty, it means that all characters have been processed. An empty string represents
+     a valid sentence so return an array containing the empty string.
+   - Initialize an empty array `results`. 
+   - Iterate from 1 to the length of `remainingStr`:
+     - Extract the substring `currentWord` from 0 to i to check if it is a valid word. 
+     - If currentWord is found in `wordSet`:
+     - Recursively call `dfs` with `remainingStr.substr(i)`, wordSet, and memoization. 
+     - Append currentWord and the recursive results to `results`(with a space if needed) to form valid sentences. 
+   - Store the `results` for `remainingStr` in memoization. 
+   - Return `results`.
+
+#### Complexity
+
+Let n be the length of the input string.
+
+##### Time complexity: O(n⋅2^n)
+
+While memoization avoids redundant computations, it does not change the overall number of subproblems that need to be 
+solved. In the worst case, there are still unique 2^n possible substrings that need to be explored, leading to an 
+exponential time complexity. For each subproblem, O(n) work is performed, so the overall complexity is O(n⋅2^n).
+
+##### Space complexity: O(n⋅2^n)
+
+The recursion stack can grow up to a depth of n, where each recursive call consumes additional space for storing the 
+current state.
+
+The memoization map needs to store the results for all possible substrings, which can be up to 2^n substrings of size n
+in the worst case, resulting in an exponential space complexity.
+
 ### Trie Optimization
+
+While the previous approaches focus on optimizing the search and computation process, we can also consider leveraging 
+efficient data structures to enhance the word lookup process. This leads us to the trie-based approach, which uses a 
+trie data structure to store the word dictionary, allowing efficient word lookup and prefix matching.
+
+The trie, also known as a prefix tree, is a tree-based data structure where each node represents a character in a word, 
+and the path from the root to a leaf node represents a complete word. This structure is particularly useful for problems
+involving word segmentation because it allows for efficient prefix matching.
+
+Here, we first build a trie from the dictionary words. Each word is represented as a path in the trie, where each node 
+corresponds to a character in the word.
+
+By using the trie, we can quickly determine whether a substring can form a valid word without having to perform linear 
+searches or set lookups. This reduces the search space and improves the efficiency of the algorithm.
+
+In this approach, instead of recursively exploring the remaining substring and using memoization, we iterate from the 
+end of the input string to the beginning (in reverse order). For each starting index (startIdx), we attempt to find 
+valid sentences that can be formed from that index by iterating through the string and checking if the current substring
+forms a valid word using the trie data structure.
+When a valid word is encountered in the trie, we append it to the list of valid sentences for the current starting index.
+If the current valid word is not the last word in the sentence, we combine it with the valid sentences formed from the 
+next index (endIdx + 1), which are retrieved from the dp dictionary.
+
+The valid sentences for each starting index are stored in the dp dictionary, ensuring that previously computed results 
+are reused. By using tabulation and storing the valid sentences for each starting index, we avoid redundant computations 
+and achieve significant time and space efficiency improvements compared to the standard backtracking method with 
+memoization.
+
+The trie-based approach offers advantages in terms of efficient word lookup and prefix matching, making it particularly 
+suitable for problems involving word segmentation or string manipulation. However, it comes with the additional overhead
+of constructing and maintaining the trie data structure, which can be more memory-intensive for large dictionaries.
+
+#### Algorithm
+
+##### Initialize TrieNode Structure
+
+- Each TrieNode has two properties:
+  - isEnd: A boolean value indicating if the node marks the end of a word.
+  - children: An array of size 26 (for lowercase English letters) to store pointers to child nodes.
+- The constructor initializes isEnd to false and all elements in children to null.
+
+##### Trie Class
+
+- The Trie class has a root pointer of type TrieNode.
+- The constructor initializes the root with a new TrieNode object.
+- The insert function:
+- Takes a string word as input.
+- Starts from the root node.
+- For each character c in the word:
+  - Calculate the index corresponding to the character. 
+  - If the child node at the calculated index doesn't exist, create a new TrieNode and assign it to that index. 
+  - Move to the child node.
+- After processing all characters, mark the current node's isEnd as true
+
+##### `wordBreak` Function
+
+- Create a Trie object.
+- Insert all words from wordDict into the trie using the insert function.
+- Initialize a map dp to store the results of subproblems.
+- Iterate from the end of the string s to the beginning (in reverse order).
+- For each starting index startIdx:
+  - Initialize a vector validSentences to store valid sentences starting from startIdx. 
+  - Initialize a current_node pointer to the root of the trie. 
+  - Iterate from startIdx to the end of the string. 
+    - For each character c in the string:
+      - Calculate the index corresponding to c. 
+      - Check if the child node at the calculated index exists in the trie. 
+      - If the child node doesn't exist, break out of the inner loop. This means that the current substring cannot form
+        a valid word, so there is no need to continue checking the remaining characters.
+      - Move to the child node.
+    - Check if the current node's isEnd is true, indicating a valid word.
+    - If a valid word is found:
+      - Extract the current word from the string using substr.
+      - If it's the last word in the sentence (endIdx is the last index):
+        - Add the current word to validSentences.
+      - If it's not the last word:
+        - Retrieve the valid sentences formed by the remaining substring from dp[endIdx + 1].
+        - Combine the current word with each sentence and add it to validSentences.
+  - Store the validSentences for the current startIdx in dp.
+- Return the valid sentences stored in dp[0], which represents the valid sentences formed from the entire string.
+
+#### Complexity Analysis
+
+Let n be the length of the input string.
+
+##### Time complexity: O(n⋅2^n)
+
+Even though the trie-based approach uses an efficient data structure for word lookup, it still needs to explore all 
+possible ways to break the string into words. In the worst case, there are 2^n unique possible partitions, leading to 
+an exponential time complexity. O(n) work is performed for each partition, so the overall complexity is O(n⋅2^n).
+
+##### Space complexity: O(n⋅2^n)
+
+The trie data structure itself can have a maximum of 2^n nodes in the worst case, where each character in the string 
+represents a separate word. Additionally, the tabulation map used in this approach can also store up to 2^n strings of 
+size n, resulting in an overall exponential space complexity.
+
+----
+
+### Further Thoughts on Complexity Analysis
+
+The complexity of this problem cannot be reduced from n⋅2^n; the worst-case scenario will still be (n⋅2^n). However, 
+using dynamic programming (DP) will make it a bit more efficient than backtracking overall because of the below test case.
+
+Consider the input "aaaaaa", with wordDict = ["a", "aa", "aaa", "aaaa", "aaaaa", "aaaaa"].
+
+Every possible partition is a valid sentence, and there are 2^(n−1) such partitions. The algorithms cannot perform 
+better than this since they must generate all valid sentences. The cost of iterating over cached results will be 
+exponential, as every possible partition will be cached, resulting in the same runtime as regular backtracking. 
+Likewise, the space complexity will also be O(n⋅2^n) for the same reason—every partition is stored in memory.
+
+Another way to explain why the worst-case complexity is O(n⋅2^n) for all the algorithms is that, given an array of 
+length n, there are n+1 ways/intervals to partition it into two parts. Each interval has two choices: to split or not 
+to split. In the worst case, we will have to check all possibilities, which results in a time complexity of O(n⋅2^(n+1)),
+which simplifies to O(n⋅2^n). This analysis is extremely similar to palindrome partitioning.
+
+Overall, this question is interesting because of the nature of this complexity. In an interview setting, if an 
+interviewer asks this question, the most expected solutions would be Backtracking and Trie, as they become natural 
+choices for the conditions and outputs we need.
diff --git a/algorithms/dynamic_programming/word_break/__init__.py b/algorithms/dynamic_programming/word_break/__init__.py
@@ -66,7 +66,7 @@ def word_break_trie(s: str, word_dict: List[str]) -> List[str]:
     return results.get(0, [])
 
 
-def word_break_dp(s: str, word_dict: List[str]) -> List[str]:
+def word_break_dp_tabulation(s: str, word_dict: List[str]) -> List[str]:
     """
     This adds spaces to s to break it up into a sequence of valid words from word_dict.
 
@@ -112,7 +112,7 @@ def word_break_dp(s: str, word_dict: List[str]) -> List[str]:
     return dp[len(s)]
 
 
-def word_break_dp_2(s: str, word_dict: List[str]) -> List[str]:
+def word_break_dp_tabulation_2(s: str, word_dict: List[str]) -> List[str]:
     """
     This adds spaces to s to break it up into a sequence of valid words from word_dict.
 
@@ -160,6 +160,62 @@ def word_break_dp_2(s: str, word_dict: List[str]) -> List[str]:
     return dp.get(0, [])
 
 
+def word_break_dp_memoization(s: str, word_dict: List[str]) -> List[str]:
+    """
+    This adds spaces to s to break it up into a sequence of valid words from word_dict.
+
+    This uses dynamic programming with memoization to store the words in the dictionary and a map to store the results
+    of subproblems.
+
+    Complexity:
+    Time: O(n*2^n): where n is the length of the string
+    Space: O(n*2^n): where n is the length of the string
+
+    Args:
+        s: The input string
+        word_dict: The dictionary of words
+    Returns:
+        List of valid sentences
+    """
+    word_set: Set[str] = set(word_dict)
+    memoization: Dict[str, List[str]] = dict()
+
+    def dfs(remaining_str: str, words_set: Set[str], memo: Dict) -> List[str]:
+        """
+        Depth-first search to find all possible word combinations
+        Args:
+            remaining_str(str): the remaining string to search through
+            words_set(set): set of dictionary words to use to construct sentences
+            memo(dict): dictionary to improve computation of already processed words
+        Returns:
+            list: possible word combinations
+        """
+        # check if the result for this substring is already memoized
+        if remaining_str in memo:
+            return memo[remaining_str]
+
+        # base case: when the string is empty, return a list containing an empty string
+        if not remaining_str:
+            return [""]
+
+        results = []
+        for i in range(1, len(remaining_str) + 1):
+            current_word = remaining_str[:i]
+            # if the current substring is a valid word in the word set
+            if current_word in words_set:
+                for next_word in dfs(remaining_str[i:], words_set, memo):
+                    # append current word and next word
+                    results.append(
+                        f"{current_word}{" " + next_word if next_word else ""}"
+                    )
+
+        # memoize the results for the current substring
+        memo[remaining_str] = results
+        return results
+
+    return dfs(s, word_set, memoization)
+
+
 def word_break_backtrack(s: str, word_dict: List[str]) -> List[str]:
     """
     This adds spaces to s to break it up into a sequence of valid words from word_dict.
@@ -176,7 +232,13 @@ def word_break_backtrack(s: str, word_dict: List[str]) -> List[str]:
     word_set = set(word_dict)
     results = []
 
-    def backtrack(sentence: str, words_set: Set[str], current_sentence: List[str], result: List[str], start_index: int):
+    def backtrack(
+        sentence: str,
+        words_set: Set[str],
+        current_sentence: List[str],
+        result: List[str],
+        start_index: int,
+    ):
         # If we've reached the end of the string, add the current sentence to results
         if start_index == len(sentence):
             result.append(" ".join(current_sentence))
@@ -189,9 +251,7 @@ def backtrack(sentence: str, words_set: Set[str], current_sentence: List[str], r
             if word in words_set:
                 current_sentence.append(word)
                 # Recursively call backtrack with the new end index
-                backtrack(
-                    sentence, words_set, current_sentence, result, end_index
-                )
+                backtrack(sentence, words_set, current_sentence, result, end_index)
                 # Remove the last word to backtrack
                 current_sentence.pop()
 
diff --git a/algorithms/dynamic_programming/word_break/test_word_break.py b/algorithms/dynamic_programming/word_break/test_word_break.py
@@ -1,7 +1,13 @@
 import unittest
 from typing import List
 from parameterized import parameterized
-from algorithms.dynamic_programming.word_break import word_break_trie, word_break_dp, word_break_dp_2, word_break_backtrack
+from algorithms.dynamic_programming.word_break import (
+    word_break_trie,
+    word_break_dp_tabulation,
+    word_break_dp_tabulation_2,
+    word_break_backtrack,
+    word_break_dp_memoization,
+)
 
 
 class WordBreakTestCases(unittest.TestCase):
@@ -83,8 +89,10 @@ def test_word_break_trie(self, s: str, word_dict: List[str], expected: List[str]
             ("catsandog", ["cats", "dog", "sand", "and", "cat"], []),
         ]
     )
-    def test_word_break_dp(self, s: str, word_dict: List[str], expected: List[str]):
-        actual = word_break_dp(s, word_dict)
+    def test_word_break_dp_tabulation(
+        self, s: str, word_dict: List[str], expected: List[str]
+    ):
+        actual = word_break_dp_tabulation(s, word_dict)
         actual.sort()
         expected.sort()
         self.assertListEqual(expected, actual)
@@ -125,8 +133,10 @@ def test_word_break_dp(self, s: str, word_dict: List[str], expected: List[str]):
             ("catsandog", ["cats", "dog", "sand", "and", "cat"], []),
         ]
     )
-    def test_word_break_dp_2(self, s: str, word_dict: List[str], expected: List[str]):
-        actual = word_break_dp_2(s, word_dict)
+    def test_word_break_dp_tabulation_2(
+        self, s: str, word_dict: List[str], expected: List[str]
+    ):
+        actual = word_break_dp_tabulation_2(s, word_dict)
         actual.sort()
         expected.sort()
         self.assertListEqual(expected, actual)
@@ -167,12 +177,58 @@ def test_word_break_dp_2(self, s: str, word_dict: List[str], expected: List[str]
             ("catsandog", ["cats", "dog", "sand", "and", "cat"], []),
         ]
     )
-    def test_word_break_backtrack(self, s: str, word_dict: List[str], expected: List[str]):
+    def test_word_break_backtrack(
+        self, s: str, word_dict: List[str], expected: List[str]
+    ):
         actual = word_break_backtrack(s, word_dict)
         actual.sort()
         expected.sort()
         self.assertListEqual(expected, actual)
 
+    @parameterized.expand(
+        [
+            (
+                "magiclly",
+                ["ag", "al", "icl", "mag", "magic", "ly", "lly"],
+                ["mag icl ly", "magic lly"],
+            ),
+            (
+                "raincoats",
+                ["rain", "oats", "coat", "s", "rains", "oat", "coats", "c"],
+                ["rain c oats", "rain c oat s", "rain coats", "rain coat s"],
+            ),
+            (
+                "highway",
+                ["crash", "cream", "high", "highway", "low", "way"],
+                ["highway", "high way"],
+            ),
+            ("robocat", ["rob", "cat", "robo", "bo", "b"], ["robo cat"]),
+            (
+                "cocomomo",
+                ["co", "mo", "coco", "momo"],
+                ["co co momo", "co co mo mo", "coco momo", "coco mo mo"],
+            ),
+            (
+                "catsanddog",
+                ["cat", "cats", "and", "sand", "dog"],
+                ["cats and dog", "cat sand dog"],
+            ),
+            (
+                "pineapplepenapple",
+                ["apple", "pen", "applepen", "pine", "pineapple"],
+                ["pine apple pen apple", "pineapple pen apple", "pine applepen apple"],
+            ),
+            ("catsandog", ["cats", "dog", "sand", "and", "cat"], []),
+        ]
+    )
+    def test_word_break_dp_memoization(
+        self, s: str, word_dict: List[str], expected: List[str]
+    ):
+        actual = word_break_dp_memoization(s, word_dict)
+        actual.sort()
+        expected.sort()
+        self.assertListEqual(expected, actual)
+
 
 if __name__ == "__main__":
     unittest.main()