[math-fun] Finding a polynomial from an integer sequence
I was wondering how many strings there are of length n, with no instances of 4 consecutive identical letters, in a b-letter alphabet. (This was inspired by a recent discussion in the rec.arts.sf.fandom newsgroup about Arthur C. Clarke's 1953 short story, "The Nine Billion Names of God." The question arose as to whether that British author was using the old British definition of billion, 10^12, rather than 10^9. The conclusion was that there isn't enough information in the story to tell. For instance we're never told the number of letters in the special religious alphabet they use. I did discover that the alphabet would need at least 13 letters for 9*10^9 names, and at least 28 letters for 9*10^12 names.) When either b or n is more than about 10, counting them gets very slow, since there are so many such strings. By thinking about how many ways there are to get four consecutive identical letters, and about how many of those ways are duplicates, I came up with a polynomial that approximates the answer: b^n - (b-3)b^(n-3) + (b-4)b^(n-4). I decided to look at the error terms, i.e. the differences between the actual number of such strings and that polynomial, and at the differences of the differences of the differences between consecutive error terms. I recalled, from reading about Babbage's Difference Engine, that it was possible to find polynomials by pursuing such differences until they're all identical. I quickly figured out how to do that. The number of steps until the differences are identical is the order of the polynomial, and the coefficient of that power is that difference divided by the factorial of the order. For instance here I'm doing it for n=9. (Not in OEIS for n>4.) The first row is the number of strings of length 9 in an alphabet of b letters (b being the column number) without four identical consecutive letters. Each number after the first row is the difference between the number above and the number above and to the right. 0: 0 298 16572 242820 1875280 9837150 39732588 132809992 384528960 994502610 2348127100 1: 298 16274 226248 1632460 7961870 29895438 93077404 251718968 609973650 1353624490 2: 15976 209974 1406212 6329410 21933568 63181966 158641564 358254682 743650840 3: 193998 1196238 4923198 15604158 41248398 95459598 199613118 385396158 4: 1002240 3726960 10680960 25644240 54211200 104153520 185783040 5: 2724720 6954000 14963280 28566960 49942320 81629520 6: 4229280 8009280 13603680 21375360 31687200 7: 3780000 5594400 7771680 10311840 8: 1814400 2177280 2540160 9: 362880 362880 When 362880 is divided by 9!, I get 1, since 362880 *is* 9!, so the first term is simply b^9. Next I take my original series and subtract j^9 from the jth term, for all j. Then I repeat the process. Here's the next step in the above case: 0: -1 -214 -3111 -19324 -77845 -240546 -621019 -1407736 -2891529 -5497390 -9820591 1: -213 -2897 -16213 -58521 -162701 -380473 -786717 -1483793 -2605861 -4323201 2: -2684 -13316 -42308 -104180 -217772 -406244 -697076 -1122068 -1717340 3: -10632 -28992 -61872 -113592 -188472 -290832 -424992 -595272 4: -18360 -32880 -51720 -74880 -102360 -134160 -170280 5: -14520 -18840 -23160 -27480 -31800 -36120 6: -4320 -4320 -4320 -4320 -4320 When -4320 is divided by 6!, I get -6, so the next term is -6*b^6. I keep repeating that process until I have nothing but zeros left. Then I know I have the whole polynomial. The problem is that I need at least two more terms than the order of the polynomial (which I already know equals the size of the alphabet). Of course if I already know, or can guess, the first few terms of the polynomial, I can subtract them out upfront, and add them back at the end, reducing the number of terms I need. But I still run into problems pretty quickly. I've only made it to n=14: 1: b 2: b^2 3: b^3 4: b^4 - b 5: b^5 - 2b^2 + b 6: b^6 - 3b^3 + 2b^2 7: b^7 - 4b^4 + 3b^3 8: b^8 - 5b^5 + 4b^4 + b^2 - b 9: b^9 - 6b^6 + 5b^5 + 3b^3 - 4b^2 + b 10: b^10 - 7b^7 + 6b^6 + 6b^4 - 9b^3 + 3b^2 11: b^11 - 8b^8 + 7b^7 + 10b^5 - 16b^4 + 6b^3 12: b^12 - 9b^9 + 8b^8 + 15b^6 - 25b^5 + 10b^4 - b^3 + 2b^2 - b 13: b^13 - 10b^10 + 9b^9 + 21b^7 - 36b^6 + 15b^5 - 4b^4 + 9b^3 - 6b^2 + b 14: b^14 - 11b^11 + 10b^10 + 28b^8 - 49b^7 + 21b^6 - 10b^5 + 24b^4 - 18b^3 + 4b^2 15: b^15 - 12b^12 + 11b^11 + 36b^9 - 64b^8 + 28b^7 - 20b^6 + ? Once a column has a few terms in it, it's obvious how to continue it. The powers are always consecutive, and some coefficients are consecutive, some are triangular numbers, and some are squares. But What about 1,4,10? I'm guessing that those are tetrahedral numbers, and the next term is 20, but that's only a guess. And 2,9,24, or 1,6,18? Who knows? In five hours of CPU time, I've only been able to count the first six terms for b=15: 0, 11536, 10264752, 924213888, 28172320000, and 448316910000. But I need at least eight terms, assuming my guess for n=15 is correct through the b^6 term. Another issue is that there's no guarantee that the polynomial is even correct. All I really know is that it will reproduce the terms I counted. Of course the more identical differences I get, the more confident I am. For instance for b=9, I counted the first 25 terms, so I actually got 16 copies of 362880. Another approach would be to count more terms after coming up with the polynomial and see if it correctly predicts the next few terms. A moment's thought will show that these two confirmation methods are exactly equivalent (and equally difficult), and that neither of them constitutes a proof that the polynomial is correct. Or that *any* polynomial is correct -- who says that, except for the first few values of n where it's obvious, the sequence is a polynomial series at all? Another piece of evidence in favor of the polynomial being correct is that the identical differences are in fact divisible by the factorial of the order. For all but the tiniest orders, that's not very likely to happen by chance. In fact, I'm tempted to cheat, stopping one step too early, with a single difference on the bottom line, if it's divisible by the factorial of its row number. So, can anyone think of a better way to come up with polynomials from those sequences, or from sequences in general? Thanks.
Linear Algebra is the easiest way to handle this. Define a vector (m,n,o) where m is the number of strings with 0 identical chars at the end, n is the number of strings with 1 identical chars at the end, etc. for 2. Now, multiply [b,0,0] / b-1 1 0 0 \n-1 / 1 \ | b-1 0 1 0 | | 1 | \ b-1 0 0 0 / \ 1 / and you'll get your polynomial directly. For example, n=15: 10 b^3 - 40 b^4 + 50 b^5 - 20 b^6 + 28 b^7 - 64 b^8 + 36 b^9 + 11 b^11 - 12 b^12 + b^15 n=16: -b + 3 b^2 - 3 b^3 + 21 b^4 - 75 b^5 + 90 b^6 - 35 b^7 + 36 b^8 - 81 b^9 + 45 b^10 + 12 b^12 - 13 b^13 + b^16 n=100: -b + 24 b^2 - 276 b^3 + 4949 b^4 - 98376 b^5 + 1253454 b^6 - 10494946 b^7 + 64544004 b^8 - 333141246 b^9 + 1629362504 b^10 - 7743362781 b^11 + 33690304389 b^12 - 128361569791 b^13 + 431877830034 b^14 - 1335734640381 b^15 + 3921198460064 b^16 - 10910667625371 b^17 + 28242216808344 b^18 - 67609636734106 b^19 + 151761346502289 b^20 - 324745264800726 b^21 + 663861239350364 b^22 - 1285594459538946 b^23 + 2352482322522504 b^24 - 4102924280776426 b^25 + 6881300744575140 b^26 - 11088385467603795 b^27 + 17074913485076335 b^28 - 25161018937191765 b^29 + 35752128552773640 b^30 - 49157831352705825 b^31 + 65147509853197632 b^32 - 83035279078363161 b^33 + 102318979447450824 b^34 - 122532507616821120 b^35 + 142277279395289985 b^36 - 159561501185322196 b^37 + 173436645823826538 b^38 - 183830369499971922 b^39 + 189721421270504140 b^40 - 189683695071289830 b^41 + 184215027377426556 b^42 - 175005707268072248 b^43 + 162383563605716442 b^44 - 146198549327742180 b^45 + 128108797724019210 b^46 - 110189986037713902 b^47 + 92754836535922896 b^48 - 75759620870362744 b^49 + 60359718326464200 b^50 - 47403775155127440 b^51 + 36442526750394128 b^52 - 27134741763100314 b^53 + 19785640697556111 b^54 - 14293230094111815 b^55 + 10072774771049298 b^56 - 6855171690732507 b^57 + 4607028473165838 b^58 - 3079978077238992 b^59 + 1988147437359153 b^60 - 1239214156458694 b^61 + 775189336608027 b^62 - 480392112798297 b^63 + 282077923597328 b^64 - 162175523221569 b^65 + 95632032357576 b^66 - 54334036442868 b^67 + 28753029529893 b^68 - 15609396216192 b^69 + 8702745182866 b^70 - 4385661077430 b^71 + 2114853884460 b^72 - 1130080241250 b^73 + 571091985240 b^74 - 245181586875 b^75 + 116784029891 b^76 - 61305581337 b^77 + 25030635630 b^78 - 9645773995 b^79 + 5155191216 b^80 - 2231638857 b^81 + 666599976 b^82 - 320297166 b^83 + 164303979 b^84 - 41811092 b^85 + 13673355 b^86 - 9221565 b^87 + 2449370 b^88 - 356445 b^89 + 360450 b^90 - 121485 b^91 + 4278 b^92 - 8649 b^93 + 4371 b^94 + 96 b^96 - 97 b^97 + b^100 On Sat, Oct 27, 2018 at 12:56 PM Keith F. Lynch <kfl@keithlynch.net> wrote:
I was wondering how many strings there are of length n, with no instances of 4 consecutive identical letters, in a b-letter alphabet. (This was inspired by a recent discussion in the rec.arts.sf.fandom newsgroup about Arthur C. Clarke's 1953 short story, "The Nine Billion Names of God." The question arose as to whether that British author was using the old British definition of billion, 10^12, rather than 10^9. The conclusion was that there isn't enough information in the story to tell. For instance we're never told the number of letters in the special religious alphabet they use. I did discover that the alphabet would need at least 13 letters for 9*10^9 names, and at least 28 letters for 9*10^12 names.)
When either b or n is more than about 10, counting them gets very slow, since there are so many such strings. By thinking about how many ways there are to get four consecutive identical letters, and about how many of those ways are duplicates, I came up with a polynomial that approximates the answer: b^n - (b-3)b^(n-3) + (b-4)b^(n-4). I decided to look at the error terms, i.e. the differences between the actual number of such strings and that polynomial, and at the differences of the differences of the differences between consecutive error terms. I recalled, from reading about Babbage's Difference Engine, that it was possible to find polynomials by pursuing such differences until they're all identical.
I quickly figured out how to do that. The number of steps until the differences are identical is the order of the polynomial, and the coefficient of that power is that difference divided by the factorial of the order. For instance here I'm doing it for n=9. (Not in OEIS for n>4.) The first row is the number of strings of length 9 in an alphabet of b letters (b being the column number) without four identical consecutive letters. Each number after the first row is the difference between the number above and the number above and to the right.
0: 0 298 16572 242820 1875280 9837150 39732588 132809992 384528960 994502610 2348127100 1: 298 16274 226248 1632460 7961870 29895438 93077404 251718968 609973650 1353624490 2: 15976 209974 1406212 6329410 21933568 63181966 158641564 358254682 743650840 3: 193998 1196238 4923198 15604158 41248398 95459598 199613118 385396158 4: 1002240 3726960 10680960 25644240 54211200 104153520 185783040 5: 2724720 6954000 14963280 28566960 49942320 81629520 6: 4229280 8009280 13603680 21375360 31687200 7: 3780000 5594400 7771680 10311840 8: 1814400 2177280 2540160 9: 362880 362880
When 362880 is divided by 9!, I get 1, since 362880 *is* 9!, so the first term is simply b^9.
Next I take my original series and subtract j^9 from the jth term, for all j. Then I repeat the process. Here's the next step in the above case:
0: -1 -214 -3111 -19324 -77845 -240546 -621019 -1407736 -2891529 -5497390 -9820591 1: -213 -2897 -16213 -58521 -162701 -380473 -786717 -1483793 -2605861 -4323201 2: -2684 -13316 -42308 -104180 -217772 -406244 -697076 -1122068 -1717340 3: -10632 -28992 -61872 -113592 -188472 -290832 -424992 -595272 4: -18360 -32880 -51720 -74880 -102360 -134160 -170280 5: -14520 -18840 -23160 -27480 -31800 -36120 6: -4320 -4320 -4320 -4320 -4320
When -4320 is divided by 6!, I get -6, so the next term is -6*b^6.
I keep repeating that process until I have nothing but zeros left. Then I know I have the whole polynomial.
The problem is that I need at least two more terms than the order of the polynomial (which I already know equals the size of the alphabet). Of course if I already know, or can guess, the first few terms of the polynomial, I can subtract them out upfront, and add them back at the end, reducing the number of terms I need. But I still run into problems pretty quickly. I've only made it to n=14:
1: b 2: b^2 3: b^3 4: b^4 - b 5: b^5 - 2b^2 + b 6: b^6 - 3b^3 + 2b^2 7: b^7 - 4b^4 + 3b^3 8: b^8 - 5b^5 + 4b^4 + b^2 - b 9: b^9 - 6b^6 + 5b^5 + 3b^3 - 4b^2 + b 10: b^10 - 7b^7 + 6b^6 + 6b^4 - 9b^3 + 3b^2 11: b^11 - 8b^8 + 7b^7 + 10b^5 - 16b^4 + 6b^3 12: b^12 - 9b^9 + 8b^8 + 15b^6 - 25b^5 + 10b^4 - b^3 + 2b^2 - b 13: b^13 - 10b^10 + 9b^9 + 21b^7 - 36b^6 + 15b^5 - 4b^4 + 9b^3 - 6b^2 + b 14: b^14 - 11b^11 + 10b^10 + 28b^8 - 49b^7 + 21b^6 - 10b^5 + 24b^4 - 18b^3 + 4b^2 15: b^15 - 12b^12 + 11b^11 + 36b^9 - 64b^8 + 28b^7 - 20b^6 + ?
Once a column has a few terms in it, it's obvious how to continue it. The powers are always consecutive, and some coefficients are consecutive, some are triangular numbers, and some are squares. But What about 1,4,10? I'm guessing that those are tetrahedral numbers, and the next term is 20, but that's only a guess. And 2,9,24, or 1,6,18? Who knows?
In five hours of CPU time, I've only been able to count the first six terms for b=15: 0, 11536, 10264752, 924213888, 28172320000, and 448316910000. But I need at least eight terms, assuming my guess for n=15 is correct through the b^6 term.
Another issue is that there's no guarantee that the polynomial is even correct. All I really know is that it will reproduce the terms I counted. Of course the more identical differences I get, the more confident I am. For instance for b=9, I counted the first 25 terms, so I actually got 16 copies of 362880. Another approach would be to count more terms after coming up with the polynomial and see if it correctly predicts the next few terms. A moment's thought will show that these two confirmation methods are exactly equivalent (and equally difficult), and that neither of them constitutes a proof that the polynomial is correct. Or that *any* polynomial is correct -- who says that, except for the first few values of n where it's obvious, the sequence is a polynomial series at all?
Another piece of evidence in favor of the polynomial being correct is that the identical differences are in fact divisible by the factorial of the order. For all but the tiniest orders, that's not very likely to happen by chance. In fact, I'm tempted to cheat, stopping one step too early, with a single difference on the bottom line, if it's divisible by the factorial of its row number.
So, can anyone think of a better way to come up with polynomials from those sequences, or from sequences in general? Thanks.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- -- http://cube20.org/ -- http://golly.sf.net/ --
Oops; sorry! Drop the fourth column in the "square" matrix. On Sat, Oct 27, 2018 at 2:01 PM Tomas Rokicki <rokicki@gmail.com> wrote:
Linear Algebra is the easiest way to handle this.
Define a vector (m,n,o) where m is the number of strings with 0 identical chars at the end, n is the number of strings with 1 identical chars at the end, etc. for 2.
Now, multiply
[b,0,0] / b-1 1 0 0 \n-1 / 1 \ | b-1 0 1 0 | | 1 | \ b-1 0 0 0 / \ 1 /
and you'll get your polynomial directly.
For example, n=15:
10 b^3 - 40 b^4 + 50 b^5 - 20 b^6 + 28 b^7 - 64 b^8 + 36 b^9 + 11 b^11 - 12 b^12 + b^15
n=16:
-b + 3 b^2 - 3 b^3 + 21 b^4 - 75 b^5 + 90 b^6 - 35 b^7 + 36 b^8 - 81 b^9 + 45 b^10 + 12 b^12 - 13 b^13 + b^16
n=100:
-b + 24 b^2 - 276 b^3 + 4949 b^4 - 98376 b^5 + 1253454 b^6 - 10494946 b^7 + 64544004 b^8 - 333141246 b^9 + 1629362504 b^10 - 7743362781 b^11 + 33690304389 b^12 - 128361569791 b^13 + 431877830034 b^14 - 1335734640381 b^15 + 3921198460064 b^16 - 10910667625371 b^17 + 28242216808344 b^18 - 67609636734106 b^19 + 151761346502289 b^20 - 324745264800726 b^21 + 663861239350364 b^22 - 1285594459538946 b^23 + 2352482322522504 b^24 - 4102924280776426 b^25 + 6881300744575140 b^26 - 11088385467603795 b^27 + 17074913485076335 b^28 - 25161018937191765 b^29 + 35752128552773640 b^30 - 49157831352705825 b^31 + 65147509853197632 b^32 - 83035279078363161 b^33 + 102318979447450824 b^34 - 122532507616821120 b^35 + 142277279395289985 b^36 - 159561501185322196 b^37 + 173436645823826538 b^38 - 183830369499971922 b^39 + 189721421270504140 b^40 - 189683695071289830 b^41 + 184215027377426556 b^42 - 175005707268072248 b^43 + 162383563605716442 b^44 - 146198549327742180 b^45 + 128108797724019210 b^46 - 110189986037713902 b^47 + 92754836535922896 b^48 - 75759620870362744 b^49 + 60359718326464200 b^50 - 47403775155127440 b^51 + 36442526750394128 b^52 - 27134741763100314 b^53 + 19785640697556111 b^54 - 14293230094111815 b^55 + 10072774771049298 b^56 - 6855171690732507 b^57 + 4607028473165838 b^58 - 3079978077238992 b^59 + 1988147437359153 b^60 - 1239214156458694 b^61 + 775189336608027 b^62 - 480392112798297 b^63 + 282077923597328 b^64 - 162175523221569 b^65 + 95632032357576 b^66 - 54334036442868 b^67 + 28753029529893 b^68 - 15609396216192 b^69 + 8702745182866 b^70 - 4385661077430 b^71 + 2114853884460 b^72 - 1130080241250 b^73 + 571091985240 b^74 - 245181586875 b^75 + 116784029891 b^76 - 61305581337 b^77 + 25030635630 b^78 - 9645773995 b^79 + 5155191216 b^80 - 2231638857 b^81 + 666599976 b^82 - 320297166 b^83 + 164303979 b^84 - 41811092 b^85 + 13673355 b^86 - 9221565 b^87 + 2449370 b^88 - 356445 b^89 + 360450 b^90 - 121485 b^91 + 4278 b^92 - 8649 b^93 + 4371 b^94 + 96 b^96 - 97 b^97 + b^100
On Sat, Oct 27, 2018 at 12:56 PM Keith F. Lynch <kfl@keithlynch.net> wrote:
I was wondering how many strings there are of length n, with no instances of 4 consecutive identical letters, in a b-letter alphabet. (This was inspired by a recent discussion in the rec.arts.sf.fandom newsgroup about Arthur C. Clarke's 1953 short story, "The Nine Billion Names of God." The question arose as to whether that British author was using the old British definition of billion, 10^12, rather than 10^9. The conclusion was that there isn't enough information in the story to tell. For instance we're never told the number of letters in the special religious alphabet they use. I did discover that the alphabet would need at least 13 letters for 9*10^9 names, and at least 28 letters for 9*10^12 names.)
When either b or n is more than about 10, counting them gets very slow, since there are so many such strings. By thinking about how many ways there are to get four consecutive identical letters, and about how many of those ways are duplicates, I came up with a polynomial that approximates the answer: b^n - (b-3)b^(n-3) + (b-4)b^(n-4). I decided to look at the error terms, i.e. the differences between the actual number of such strings and that polynomial, and at the differences of the differences of the differences between consecutive error terms. I recalled, from reading about Babbage's Difference Engine, that it was possible to find polynomials by pursuing such differences until they're all identical.
I quickly figured out how to do that. The number of steps until the differences are identical is the order of the polynomial, and the coefficient of that power is that difference divided by the factorial of the order. For instance here I'm doing it for n=9. (Not in OEIS for n>4.) The first row is the number of strings of length 9 in an alphabet of b letters (b being the column number) without four identical consecutive letters. Each number after the first row is the difference between the number above and the number above and to the right.
0: 0 298 16572 242820 1875280 9837150 39732588 132809992 384528960 994502610 2348127100 1: 298 16274 226248 1632460 7961870 29895438 93077404 251718968 609973650 1353624490 2: 15976 209974 1406212 6329410 21933568 63181966 158641564 358254682 743650840 3: 193998 1196238 4923198 15604158 41248398 95459598 199613118 385396158 4: 1002240 3726960 10680960 25644240 54211200 104153520 185783040 5: 2724720 6954000 14963280 28566960 49942320 81629520 6: 4229280 8009280 13603680 21375360 31687200 7: 3780000 5594400 7771680 10311840 8: 1814400 2177280 2540160 9: 362880 362880
When 362880 is divided by 9!, I get 1, since 362880 *is* 9!, so the first term is simply b^9.
Next I take my original series and subtract j^9 from the jth term, for all j. Then I repeat the process. Here's the next step in the above case:
0: -1 -214 -3111 -19324 -77845 -240546 -621019 -1407736 -2891529 -5497390 -9820591 1: -213 -2897 -16213 -58521 -162701 -380473 -786717 -1483793 -2605861 -4323201 2: -2684 -13316 -42308 -104180 -217772 -406244 -697076 -1122068 -1717340 3: -10632 -28992 -61872 -113592 -188472 -290832 -424992 -595272 4: -18360 -32880 -51720 -74880 -102360 -134160 -170280 5: -14520 -18840 -23160 -27480 -31800 -36120 6: -4320 -4320 -4320 -4320 -4320
When -4320 is divided by 6!, I get -6, so the next term is -6*b^6.
I keep repeating that process until I have nothing but zeros left. Then I know I have the whole polynomial.
The problem is that I need at least two more terms than the order of the polynomial (which I already know equals the size of the alphabet). Of course if I already know, or can guess, the first few terms of the polynomial, I can subtract them out upfront, and add them back at the end, reducing the number of terms I need. But I still run into problems pretty quickly. I've only made it to n=14:
1: b 2: b^2 3: b^3 4: b^4 - b 5: b^5 - 2b^2 + b 6: b^6 - 3b^3 + 2b^2 7: b^7 - 4b^4 + 3b^3 8: b^8 - 5b^5 + 4b^4 + b^2 - b 9: b^9 - 6b^6 + 5b^5 + 3b^3 - 4b^2 + b 10: b^10 - 7b^7 + 6b^6 + 6b^4 - 9b^3 + 3b^2 11: b^11 - 8b^8 + 7b^7 + 10b^5 - 16b^4 + 6b^3 12: b^12 - 9b^9 + 8b^8 + 15b^6 - 25b^5 + 10b^4 - b^3 + 2b^2 - b 13: b^13 - 10b^10 + 9b^9 + 21b^7 - 36b^6 + 15b^5 - 4b^4 + 9b^3 - 6b^2 + b 14: b^14 - 11b^11 + 10b^10 + 28b^8 - 49b^7 + 21b^6 - 10b^5 + 24b^4 - 18b^3 + 4b^2 15: b^15 - 12b^12 + 11b^11 + 36b^9 - 64b^8 + 28b^7 - 20b^6 + ?
Once a column has a few terms in it, it's obvious how to continue it. The powers are always consecutive, and some coefficients are consecutive, some are triangular numbers, and some are squares. But What about 1,4,10? I'm guessing that those are tetrahedral numbers, and the next term is 20, but that's only a guess. And 2,9,24, or 1,6,18? Who knows?
In five hours of CPU time, I've only been able to count the first six terms for b=15: 0, 11536, 10264752, 924213888, 28172320000, and 448316910000. But I need at least eight terms, assuming my guess for n=15 is correct through the b^6 term.
Another issue is that there's no guarantee that the polynomial is even correct. All I really know is that it will reproduce the terms I counted. Of course the more identical differences I get, the more confident I am. For instance for b=9, I counted the first 25 terms, so I actually got 16 copies of 362880. Another approach would be to count more terms after coming up with the polynomial and see if it correctly predicts the next few terms. A moment's thought will show that these two confirmation methods are exactly equivalent (and equally difficult), and that neither of them constitutes a proof that the polynomial is correct. Or that *any* polynomial is correct -- who says that, except for the first few values of n where it's obvious, the sequence is a polynomial series at all?
Another piece of evidence in favor of the polynomial being correct is that the identical differences are in fact divisible by the factorial of the order. For all but the tiniest orders, that's not very likely to happen by chance. In fact, I'm tempted to cheat, stopping one step too early, with a single difference on the bottom line, if it's divisible by the factorial of its row number.
So, can anyone think of a better way to come up with polynomials from those sequences, or from sequences in general? Thanks.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- -- http://cube20.org/ -- http://golly.sf.net/ --
-- -- http://cube20.org/ -- http://golly.sf.net/ --
In general, the problem of determining a(n), the number of strings of length n, over a b-letter alphabet, having no instances of the particular string x as a contiguous subword (or more generally, no instances of any of the strings in some set S) is more or less completely solved. For a single word, there are the papers of Guibas and Odlyzko from the 1980's. Basically, the number of such words is known to be a generating function whose denominator is a polynomial of degree |x| (the length of x). The polynomial is specified by the "auto-correlation" of x (roughly speaking, the ways x can overlap x). The polynomial gives the recurrence and the zeroes govern the growth rate of a(n). More generally, for a set S, there is the Goulden-Jackson cluster method and Zeilberger has some free Maple software that can solve the problem; you just give it S and it produces the generating function. It is called DAVID_IAN and is available here: http://sites.math.rutgers.edu/~zeilberg/gj.html For your particular example the generating function can be computed as follows in Maple (once the DAVID_IAN package is loaded):
GJs({1,2,3},{[1,1,1,1],[2,2,2,2],[3,3,3,3]},x); 3 2 x + x + x + 1 - --------------------- 3 2 2 x + 2 x + 2 x - 1
GJs({1,2,3,4},{[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]},x); 3 2 x + x + x + 1 - --------------------- 3 2 3 x + 3 x + 3 x - 1
and now you probably have enough to guess and prove the general form of the generating function. a(n) is not going to be a simply-expressible function of powers of b, because it's going to depend on the zeroes of the denominator polynomial, which will not include b. Jeffrey Shallit On 10/27/18 3:56 PM, Keith F. Lynch wrote:
I was wondering how many strings there are of length n, with no instances of 4 consecutive identical letters, in a b-letter alphabet. (This was inspired by a recent discussion in the rec.arts.sf.fandom newsgroup about Arthur C. Clarke's 1953 short story, "The Nine Billion Names of God." The question arose as to whether that British author was using the old British definition of billion, 10^12, rather than 10^9. The conclusion was that there isn't enough information in the story to tell. For instance we're never told the number of letters in the special religious alphabet they use. I did discover that the alphabet would need at least 13 letters for 9*10^9 names, and at least 28 letters for 9*10^12 names.)
When either b or n is more than about 10, counting them gets very slow, since there are so many such strings. By thinking about how many ways there are to get four consecutive identical letters, and about how many of those ways are duplicates, I came up with a polynomial that approximates the answer: b^n - (b-3)b^(n-3) + (b-4)b^(n-4). I decided to look at the error terms, i.e. the differences between the actual number of such strings and that polynomial, and at the differences of the differences of the differences between consecutive error terms. I recalled, from reading about Babbage's Difference Engine, that it was possible to find polynomials by pursuing such differences until they're all identical.
I quickly figured out how to do that. The number of steps until the differences are identical is the order of the polynomial, and the coefficient of that power is that difference divided by the factorial of the order. For instance here I'm doing it for n=9. (Not in OEIS for n>4.) The first row is the number of strings of length 9 in an alphabet of b letters (b being the column number) without four identical consecutive letters. Each number after the first row is the difference between the number above and the number above and to the right.
0: 0 298 16572 242820 1875280 9837150 39732588 132809992 384528960 994502610 2348127100 1: 298 16274 226248 1632460 7961870 29895438 93077404 251718968 609973650 1353624490 2: 15976 209974 1406212 6329410 21933568 63181966 158641564 358254682 743650840 3: 193998 1196238 4923198 15604158 41248398 95459598 199613118 385396158 4: 1002240 3726960 10680960 25644240 54211200 104153520 185783040 5: 2724720 6954000 14963280 28566960 49942320 81629520 6: 4229280 8009280 13603680 21375360 31687200 7: 3780000 5594400 7771680 10311840 8: 1814400 2177280 2540160 9: 362880 362880
When 362880 is divided by 9!, I get 1, since 362880 *is* 9!, so the first term is simply b^9.
Next I take my original series and subtract j^9 from the jth term, for all j. Then I repeat the process. Here's the next step in the above case:
0: -1 -214 -3111 -19324 -77845 -240546 -621019 -1407736 -2891529 -5497390 -9820591 1: -213 -2897 -16213 -58521 -162701 -380473 -786717 -1483793 -2605861 -4323201 2: -2684 -13316 -42308 -104180 -217772 -406244 -697076 -1122068 -1717340 3: -10632 -28992 -61872 -113592 -188472 -290832 -424992 -595272 4: -18360 -32880 -51720 -74880 -102360 -134160 -170280 5: -14520 -18840 -23160 -27480 -31800 -36120 6: -4320 -4320 -4320 -4320 -4320
When -4320 is divided by 6!, I get -6, so the next term is -6*b^6.
I keep repeating that process until I have nothing but zeros left. Then I know I have the whole polynomial.
The problem is that I need at least two more terms than the order of the polynomial (which I already know equals the size of the alphabet). Of course if I already know, or can guess, the first few terms of the polynomial, I can subtract them out upfront, and add them back at the end, reducing the number of terms I need. But I still run into problems pretty quickly. I've only made it to n=14:
1: b 2: b^2 3: b^3 4: b^4 - b 5: b^5 - 2b^2 + b 6: b^6 - 3b^3 + 2b^2 7: b^7 - 4b^4 + 3b^3 8: b^8 - 5b^5 + 4b^4 + b^2 - b 9: b^9 - 6b^6 + 5b^5 + 3b^3 - 4b^2 + b 10: b^10 - 7b^7 + 6b^6 + 6b^4 - 9b^3 + 3b^2 11: b^11 - 8b^8 + 7b^7 + 10b^5 - 16b^4 + 6b^3 12: b^12 - 9b^9 + 8b^8 + 15b^6 - 25b^5 + 10b^4 - b^3 + 2b^2 - b 13: b^13 - 10b^10 + 9b^9 + 21b^7 - 36b^6 + 15b^5 - 4b^4 + 9b^3 - 6b^2 + b 14: b^14 - 11b^11 + 10b^10 + 28b^8 - 49b^7 + 21b^6 - 10b^5 + 24b^4 - 18b^3 + 4b^2 15: b^15 - 12b^12 + 11b^11 + 36b^9 - 64b^8 + 28b^7 - 20b^6 + ?
Once a column has a few terms in it, it's obvious how to continue it. The powers are always consecutive, and some coefficients are consecutive, some are triangular numbers, and some are squares. But What about 1,4,10? I'm guessing that those are tetrahedral numbers, and the next term is 20, but that's only a guess. And 2,9,24, or 1,6,18? Who knows?
In five hours of CPU time, I've only been able to count the first six terms for b=15: 0, 11536, 10264752, 924213888, 28172320000, and 448316910000. But I need at least eight terms, assuming my guess for n=15 is correct through the b^6 term.
Another issue is that there's no guarantee that the polynomial is even correct. All I really know is that it will reproduce the terms I counted. Of course the more identical differences I get, the more confident I am. For instance for b=9, I counted the first 25 terms, so I actually got 16 copies of 362880. Another approach would be to count more terms after coming up with the polynomial and see if it correctly predicts the next few terms. A moment's thought will show that these two confirmation methods are exactly equivalent (and equally difficult), and that neither of them constitutes a proof that the polynomial is correct. Or that *any* polynomial is correct -- who says that, except for the first few values of n where it's obvious, the sequence is a polynomial series at all?
Another piece of evidence in favor of the polynomial being correct is that the identical differences are in fact divisible by the factorial of the order. For all but the tiniest orders, that's not very likely to happen by chance. In fact, I'm tempted to cheat, stopping one step too early, with a single difference on the bottom line, if it's divisible by the factorial of its row number.
So, can anyone think of a better way to come up with polynomials from those sequences, or from sequences in general? Thanks.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (3)
-
Jeffrey Shallit -
Keith F. Lynch -
Tomas Rokicki