最短长度编码字符串

Given a **non-empty **string, encode the string such that its encoded length is the shortest.

The encoding rule is:k[encoded_string], where the encoded_string inside the square brackets is being repeated exactly k times.

Note:

  1. k will be a positive integer and encoded string will not be empty or have extra space.
  2. You may assume that the input string contains only lowercase English letters. The string's length is at most 160.
  3. If an encoding process does not make the string shorter, then do not encode it. If there are several solutions, return any of them is fine.

Example 1:

Input: "aaa"
Output: "aaa"
Explanation: There is no way to encode it such that it is shorter than the input string, so we do not encode it.

Example 2:

Input: "aaaaa"
Output: "5[a]"
Explanation: "5[a]" is shorter than "aaaaa" by 1 character.

Example 3:

Input: "aaaaaaaaaa"
Output: "10[a]"
Explanation: "a9[a]" or "9[a]a" are also valid solutions, both of them have the same length = 5, which is the same as "10[a]".

Example 4:

Input: "aabcaabcd"
Output: "2[aabc]d"
Explanation: "aabc" occurs twice, so one answer can be "2[aabc]d".

Example 5:

Input: "abbbabbbcabbbabbbc"
Output: "2[2[abbb]c]"
Explanation: "abbbabbbc" occurs twice, but "abbbabbbc" can also be encoded to "2[abbb]c", so one answer can be "2[2[abbb]c]".
s=pattern*k <=等价于=> (s+s).find(s,1) < s.size()

设d=(s+s).find(s,1),则pattern=s.substr(0,d),k=s.size()/d。想象将s串平移一小段后仍与原串重叠,平移距离就是pattern长d。

string encode(string s) {
    // 用dp[i][L]记录s[i,i+L)的最短编码串
    const int N = s.size();
    vector<vector<string>> dp(N, vector<string>(N + 1)); // dp[i][0]=""
    for (int L = 1; L <= N; L++) {
        for (int i = 0; i + L <= N; i++) {
            dp[i][L] = collapse(s.substr(i, L), dp[i]); // 尝试pattern*k模式
            for (int k = 1; k < L; k++) { // 尝试分两段,第一段长k
                if (dp[i][k].size() + dp[i+k][L-k].size() < dp[i][L].size()) {
                    dp[i][L] = dp[i][k] + dp[i+k][L-k];
                }
            }
        }
    }
    return dp[0][N];
}

// si是s[i,i+L)子串,要看si有没有更短编码串,dpi是dp[i][<L]
string collapse(const string &si, vector<string> &dpi) {
    int d = (si + si).find(si, 1), L = si.size();
    if (d < L) { // si=pattern*k
        auto encoded = to_string(L / d) + "[" + dpi[d] + "]"; // dp[i][d]是子问题最优解
        if (encoded.size() < L) return encoded;
    }
    return si;
}