@@ -233,4 +233,173 @@ the `dp` array.
233233
234234### Dynamic Programming - Memoization
235235
236+ We can improve the efficiency of the backtracking method by using Memoization, which stores the results of subproblems
237+ to avoid recalculating them.
238+
239+ We use a depth-first search (DFS) function that recursively breaks the string into words. However, before performing a
240+ recursive call, we check if the results for the current substring have already been computed and stored in a memoization
241+ map (typically a dictionary or hash table).
242+
243+ If the results of the current substring are found in the memoization map, we can directly return them without further
244+ computation. If not, we proceed with the recursive call, computing the results and storing them in the memoization map
245+ before returning them.
246+
247+ By memoizing the results, we can reduce the number of computations by ensuring that each substring is processed only
248+ once in average cases.
249+
250+ #### Algorithm
251+
252+ 1 . Convert the ` wordDict ` array into an unordered set ` wordSet ` for efficient lookups.
253+ 2 . Initialize an empty unordered map ` memoization ` to store the results of subproblems.
254+ 3 . Call the ` dfs ` function with the input string s, wordSet, and memoization.
255+ - Check if the answer for the current ` remainingSt ` r(the remaining part of the string to be processed) are already
256+ in ` memoization ` . If so, return them.
257+ - Base Case: If ` remainingStr ` is empty, it means that all characters have been processed. An empty string represents
258+ a valid sentence so return an array containing the empty string.
259+ - Initialize an empty array ` results ` .
260+ - Iterate from 1 to the length of ` remainingStr ` :
261+ - Extract the substring ` currentWord ` from 0 to i to check if it is a valid word.
262+ - If currentWord is found in ` wordSet ` :
263+ - Recursively call ` dfs ` with ` remainingStr.substr(i) ` , wordSet, and memoization.
264+ - Append currentWord and the recursive results to ` results ` (with a space if needed) to form valid sentences.
265+ - Store the ` results ` for ` remainingStr ` in memoization.
266+ - Return ` results ` .
267+
268+ #### Complexity
269+
270+ Let n be the length of the input string.
271+
272+ ##### Time complexity: O(n⋅2^n)
273+
274+ While memoization avoids redundant computations, it does not change the overall number of subproblems that need to be
275+ solved. In the worst case, there are still unique 2^n possible substrings that need to be explored, leading to an
276+ exponential time complexity. For each subproblem, O(n) work is performed, so the overall complexity is O(n⋅2^n).
277+
278+ ##### Space complexity: O(n⋅2^n)
279+
280+ The recursion stack can grow up to a depth of n, where each recursive call consumes additional space for storing the
281+ current state.
282+
283+ The memoization map needs to store the results for all possible substrings, which can be up to 2^n substrings of size n
284+ in the worst case, resulting in an exponential space complexity.
285+
236286### Trie Optimization
287+
288+ While the previous approaches focus on optimizing the search and computation process, we can also consider leveraging
289+ efficient data structures to enhance the word lookup process. This leads us to the trie-based approach, which uses a
290+ trie data structure to store the word dictionary, allowing efficient word lookup and prefix matching.
291+
292+ The trie, also known as a prefix tree, is a tree-based data structure where each node represents a character in a word,
293+ and the path from the root to a leaf node represents a complete word. This structure is particularly useful for problems
294+ involving word segmentation because it allows for efficient prefix matching.
295+
296+ Here, we first build a trie from the dictionary words. Each word is represented as a path in the trie, where each node
297+ corresponds to a character in the word.
298+
299+ By using the trie, we can quickly determine whether a substring can form a valid word without having to perform linear
300+ searches or set lookups. This reduces the search space and improves the efficiency of the algorithm.
301+
302+ In this approach, instead of recursively exploring the remaining substring and using memoization, we iterate from the
303+ end of the input string to the beginning (in reverse order). For each starting index (startIdx), we attempt to find
304+ valid sentences that can be formed from that index by iterating through the string and checking if the current substring
305+ forms a valid word using the trie data structure.
306+ When a valid word is encountered in the trie, we append it to the list of valid sentences for the current starting index.
307+ If the current valid word is not the last word in the sentence, we combine it with the valid sentences formed from the
308+ next index (endIdx + 1), which are retrieved from the dp dictionary.
309+
310+ The valid sentences for each starting index are stored in the dp dictionary, ensuring that previously computed results
311+ are reused. By using tabulation and storing the valid sentences for each starting index, we avoid redundant computations
312+ and achieve significant time and space efficiency improvements compared to the standard backtracking method with
313+ memoization.
314+
315+ The trie-based approach offers advantages in terms of efficient word lookup and prefix matching, making it particularly
316+ suitable for problems involving word segmentation or string manipulation. However, it comes with the additional overhead
317+ of constructing and maintaining the trie data structure, which can be more memory-intensive for large dictionaries.
318+
319+ #### Algorithm
320+
321+ ##### Initialize TrieNode Structure
322+
323+ - Each TrieNode has two properties:
324+ - isEnd: A boolean value indicating if the node marks the end of a word.
325+ - children: An array of size 26 (for lowercase English letters) to store pointers to child nodes.
326+ - The constructor initializes isEnd to false and all elements in children to null.
327+
328+ ##### Trie Class
329+
330+ - The Trie class has a root pointer of type TrieNode.
331+ - The constructor initializes the root with a new TrieNode object.
332+ - The insert function:
333+ - Takes a string word as input.
334+ - Starts from the root node.
335+ - For each character c in the word:
336+ - Calculate the index corresponding to the character.
337+ - If the child node at the calculated index doesn't exist, create a new TrieNode and assign it to that index.
338+ - Move to the child node.
339+ - After processing all characters, mark the current node's isEnd as true
340+
341+ ##### ` wordBreak ` Function
342+
343+ - Create a Trie object.
344+ - Insert all words from wordDict into the trie using the insert function.
345+ - Initialize a map dp to store the results of subproblems.
346+ - Iterate from the end of the string s to the beginning (in reverse order).
347+ - For each starting index startIdx:
348+ - Initialize a vector validSentences to store valid sentences starting from startIdx.
349+ - Initialize a current_node pointer to the root of the trie.
350+ - Iterate from startIdx to the end of the string.
351+ - For each character c in the string:
352+ - Calculate the index corresponding to c.
353+ - Check if the child node at the calculated index exists in the trie.
354+ - If the child node doesn't exist, break out of the inner loop. This means that the current substring cannot form
355+ a valid word, so there is no need to continue checking the remaining characters.
356+ - Move to the child node.
357+ - Check if the current node's isEnd is true, indicating a valid word.
358+ - If a valid word is found:
359+ - Extract the current word from the string using substr.
360+ - If it's the last word in the sentence (endIdx is the last index):
361+ - Add the current word to validSentences.
362+ - If it's not the last word:
363+ - Retrieve the valid sentences formed by the remaining substring from dp[ endIdx + 1] .
364+ - Combine the current word with each sentence and add it to validSentences.
365+ - Store the validSentences for the current startIdx in dp.
366+ - Return the valid sentences stored in dp[ 0] , which represents the valid sentences formed from the entire string.
367+
368+ #### Complexity Analysis
369+
370+ Let n be the length of the input string.
371+
372+ ##### Time complexity: O(n⋅2^n)
373+
374+ Even though the trie-based approach uses an efficient data structure for word lookup, it still needs to explore all
375+ possible ways to break the string into words. In the worst case, there are 2^n unique possible partitions, leading to
376+ an exponential time complexity. O(n) work is performed for each partition, so the overall complexity is O(n⋅2^n).
377+
378+ ##### Space complexity: O(n⋅2^n)
379+
380+ The trie data structure itself can have a maximum of 2^n nodes in the worst case, where each character in the string
381+ represents a separate word. Additionally, the tabulation map used in this approach can also store up to 2^n strings of
382+ size n, resulting in an overall exponential space complexity.
383+
384+ ----
385+
386+ ### Further Thoughts on Complexity Analysis
387+
388+ The complexity of this problem cannot be reduced from n⋅2^n; the worst-case scenario will still be (n⋅2^n). However,
389+ using dynamic programming (DP) will make it a bit more efficient than backtracking overall because of the below test case.
390+
391+ Consider the input "aaaaaa", with wordDict = [ "a", "aa", "aaa", "aaaa", "aaaaa", "aaaaa"] .
392+
393+ Every possible partition is a valid sentence, and there are 2^(n−1) such partitions. The algorithms cannot perform
394+ better than this since they must generate all valid sentences. The cost of iterating over cached results will be
395+ exponential, as every possible partition will be cached, resulting in the same runtime as regular backtracking.
396+ Likewise, the space complexity will also be O(n⋅2^n) for the same reason—every partition is stored in memory.
397+
398+ Another way to explain why the worst-case complexity is O(n⋅2^n) for all the algorithms is that, given an array of
399+ length n, there are n+1 ways/intervals to partition it into two parts. Each interval has two choices: to split or not
400+ to split. In the worst case, we will have to check all possibilities, which results in a time complexity of O(n⋅2^(n+1)),
401+ which simplifies to O(n⋅2^n). This analysis is extremely similar to palindrome partitioning.
402+
403+ Overall, this question is interesting because of the nature of this complexity. In an interview setting, if an
404+ interviewer asks this question, the most expected solutions would be Backtracking and Trie, as they become natural
405+ choices for the conditions and outputs we need.
0 commit comments