Re: [math-fun] optimal advance tournament scheduling
On June 19th I wrote:
I'm thinking out loud (silently, since this is typing, not talking) about how to program the "optimal advance tournament scheduling" problem.
I finally wrote, tested, and ran my program.
First, I've decided not to bother excluding non-connected graphs, as there are so few that it would take more time to exclude them than getting rid of them would save. ...
What about 6 and 2? Yes, but there's only one such graph; it consists of the complete graphs of 6 and 2 vertices. And the score of that is 16, regardless of which players are in which positions, since we have complete information about the ordering of the 6 and complete information about the ordering of the other 2, but no information that relates any of the former to either of the latter, and 28 - 2*6 = 16. Knowing the score of that arrangement will be a useful test of my program.
Confirmed. That K6 + K2 disconnected graph has the worst average score of all 1312 graphs, 16.
What about 7 and 1? Yes. There are at most (21 choose 16)*8 labeled graphs. That's 162792, a very small proportion of the 30421755 total labeled graphs, about 1/187. (Both numbers are actually smaller, as I'm ignoring symmetries.)
Both numbers are indeed much smaller. 22 disconnected graphs out of 1312 total graphs. About 1/60. (As always in this subject, the universe of discourse is restricted to graphs with exactly 8 vertices and exactly 16 edges.)
All graphs will always be represented by an 8x8 matrix with a 1 at both A,B and B,A meaning that A and B play a game, and 0 in both those locations meaning that they don't. There will be 16 1s and 12 0s in each half of the (symmetric) matrix, excluding the main diagonal (whose contents are irrelevant).
Or, if it's a directed graph, with a 1 at A,B and a 0 at B,A meaning that A won a game against B, a 1 at B,A and a 0 at A,B meaning that B won a game against A, a 0 in both locations if they didn't play each other, and a 1 in both locations if there's a bug in my program.
Since I number the vertices, hence also the rows and columns, in order of the player rankings, that means the directed who-beat-whom graph is always upper triangular.
I can see two main approaches. I can generate all 30-million+ labeled graphs. In each case each vertex number will be the same as the player rank number. I need to solve each of them, by placing an arrow on each edge, from the stronger (lower numbered) to the weaker (higher numbered) player. I then raise that directed-graph non-symmetric matrix to every power, 1 through 7, add those 7 matrices together, and subtract the number of 0s in the sum matrix (other than those on the main diagonal) from 28 to get the score.
This is the approach I took. Except that I didn't need to add those 7 matrices together, since if I put 1s on the main diagonal the successive powers give me not the number of paths of length exactly N, but the number of paths of length N or shorter, which is what I want. And I don't need to raise anything to the 7th power. I stop after the Nth power matrix equals the N-1 power matrix. The highest power I ever needed to raise any matrix to was 5, and that was rare. Also, I'm not exactly multiplying matrices or raising them to powers, since instead of adding and multiplying matrix elements, I'm ORing and ANDing them. That means every matrix element is always 0 or 1, with a 1 representing a connection, rather than a number that counts the number of distinct paths (which I don't care about).
After doing that with all 30 million+, I average the scores over the equivalent graphs, i.e. those which are identical except for the labels on the vertices, i.e. those whose non-directed matrices are identical under a permutation of the rows combined with an identical permutation of the columns.
Determining which were the equivalent graphs was the hard part. The brute force approach would have been to try all 8! = 40320 vertex label permutations (i.e. permuting the rows and identically permuting the columns) for all 28-choose-16 = 30421755 matrices, for a total of more than a 16 trillion computations. Much to my surprise, a timing test showed that this would take only about a week to run. I was tempted. But that's not the approach I took. Instead my approach was to do two pseudo-random permutations of each of the 30+ million matrices to create a large random directed graph whose vertices were these matrices representing graphs. For instance 03FDE78 5024 15504 077BE78 15301 678 077DD78 16363 17819 means that, for instance, a random permutation of graph 03FDE78, the first graph in my table, was identical to the 5024th graph in the same table, and another random permutation of the same graph was identical to the 15504th graph in the same table. I would then turn that into a non-directed graph by including the reverse links. Those three lines became: 03FDE78 5024 15504 12562 16057 -1 077BE78 15301 678 607 16018 17460 077DD78 16363 17819 8584 15750 -1 That indicates that, for instance, one of the two random permutations of the 12562nd graph in the table was identical to the 1st graph in the table, and that one of the two random permutations of the 607th graph in the table was identical to the second graph in the table. The -1 entries represent unused values. I only allocated enough space for three reverse links, and in only one of the three cases I list above did there happen to be that many. Then what? I obviously couldn't create the adjacency matrix of this giant random graph, as it would have more than 900 trillion 28-bit entries, and I don't have access to a computer with that much memory. Even if I did, raising such a matrix to successive powers would take far too long. So I used a depth-first search, starting with each 8x8 matrix in numerical order, labeling each node with the identity of the starting matrix, and not following any node that's already labeled. Such a search has to terminate even though the large graph isn't a tree, since there are only finitely many nodes to label. I thought that would result in every node being labeled with the identity of the smallest-numbered node that's a permutation of it. And it mostly worked, but occasionally a new tree would grow in the middle of an existing but not yet fully labeled tree. When that happens, it's a simple matter to check for such conflicts and fix them. The only problem was that this meant having seven numbers for each of the 30+ million 8x8 matrices, and my computer didn't have that much memory available. So I split it by degree sequences. For instance 44444444 to indicate that every vertex links to four other vertices. The largest degree sequence set is 65544332, with 3543120 graphs, about 12% of the total, so that's the size of my arrays. (65555510 is the smallest, with just 336.) The digits of every degree sequence sum to 32. There are 97 degree sequences in total, of which 84 represent connected graphs, and 14 represent disconnected graphs. Those numbers don't add up, you say? That's because one of them, 55555511, represents both. It includes the disconnected K6 + K2 graph, and it also includes a single (except for vertex renumbering) connected graph too. (That was another bug that I wasted time trying to fix, before I realized that it wasn't a bug.) The degree sequence with the most non-isomorphic graphs (i.e. equivalence classes) is 55444433, with 117. 27 degree sequences have just one non-isomorphic graph, of which 19 are connected and 8 disconnected.
While I'm averaging the scores of the equivalent graphs, I can also keep track of whether the maximum score of any of them is always 28, (and if it's not, whether it's because I've stumbled into one of the few non-connected graphs), and what the minimum score is, and any other statistic anyone is interested in, within reason.
I did so. Of the 1312 non-isomorphic graphs, 647 had a minimum of 16, 589 had 17, and 76 had 18. Those 76 are tied for Rawlsian best. Of the 76, the one with the highest average score is: 05F7D78 54444443 n:20160 min:18 max:28 0,0,668,1134,1786,2078,2856,3100,2806,2452,1866,1008,406 22.9565 Of the 1290 connected non-isomorphic graphs, 626 had a minimum of 16, 588 had 17, and 76 had 18. Of the 22 disconnected non-isomorphic graphs, 21 had a minimum of 16, and 1 had a minimum of 17. Here's the one with 17: 0558FBF 66444440 n:2016 min:17 max:21 0,192,528,600,384,312,0,0,0,0,0,0,0 19.0476 Of the 1312 non-isomorphic graphs, 1 had a maximum of 16 (the disconnected K6 + K2 graph), 21 (all of which were disconnected) had 21, 9 (all of which were connected) had 27, and 1281 (all of which were connected) had 28. The overall highest average score for any non-isomorphic graph is 03FDE78 44444444 n:35 min:16 max:28 1,0,0,2,1,0,4,4,7,3,9,3,1 24.0000 This is, unsurprisingly, K4,4. Interestingly, it has the second- lowest n (number of isomorphic graphs, 35. The lowest n, 28, is held by the non-isomorphic graph with the *worst* average score, K6 + K2: 2DDEFC0 55555511 n:28 min:16 max:16 28,0,0,0,0,0,0,0,0,0,0,0,0 16.0000 The highest n is 40320, shared by 437 non-isomorphic graphs, of which all are connected. There are 20 different values of n, all of which are divisors of 8! = 40320. The quotient is the number of ways to renumber the vertices without changing what's connected to what, i.e. the size of the graph's homeomorphism group. The average n is 30421755 / 1312 = 23187.3... That's the total number of graphs divided by the number of non-isomorphic graphs.
So, how can I find only one of each set of equivalent graphs? I can put the 12 0s and 16 1s in the upper right half of the (non-directional, symmetric) matrix in order, left to right, top to bottom, to make a unique 28-bit number, and define the canonical one as being the smallest in the equivalence class, ...
That is indeed what I did.
I tried, with pen and paper, to find a connected graph without a non-looping path of length 7. Perhaps a stellate graph, in which several short loops emanate from one very busy player? I wasn't able to find one. But then I didn't try very hard.
I've since done an exhaustive search. The longest such path has a length of 5. In other words, it's possible to have players A and F such that the only way we know that A would beat F is because A beat B, B beat C, C beat D, D beat E, and E beat F. But it's not possible to have a longer such path without knowing the same fact by a shorter route. An example of such a graph is 36630FF. I'll use it as an illustration of the meaning of those hexadecimal graph numbers: Those numbers are always 28 bits, of which exactly 16 are 1s. I put the bits in the upper right triangle of the matrix, left to right, top to bottom, *least* significant bit first: The leading 3 represents the 0 at the end up the next to last row, the 1 0 at the end of the row above that, and the final 1 at the end of the row above that. The final F yielded the first four 1s on the first row (right to left). 0 1 1 1 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 That triangle represents the directed who-beat-whom graph. It can be mirrored on the other side of the diagonal to create the symmetrical non-directed adjacency matrix for the original graph. If you then count the number of 1-bits in each row (or in each column), and sort those eight numbers in descending order, you'll get the degree sequence, 75444332. The above directed graph represents the following 16 games: 0: 1 2 3 4 5 6 7 -- player 0 played and beat all other players 1: 2 7 (0) -- player 1 beat players 2 and 7 (and was beaten by player 0) 2: 3 7 (0 1) -- player 2 beat players 3 and 7 (and was beaten by 0 and 1) 3: 4 7 (0 2) -- player 4 beat 4 and 7 (and was beaten by 0 and 2) 4: 5 7 (0 3) -- player 4 beat 5 and 7 (and was beaten by 0 and 3) 5: 6 (0 4) -- player 5 beat 6 (and was beaten by 0 and 4) 6: (0 5) -- player 6 beat nobody (and was beaten by 0 and 5) 7: (0 1 2 3 4) -- player 7 beat nobody (and was beaten by 0 through 4) We know that player 1 can beat player 6 only because each of players 1 through 5 beat the next-higher-numbered player. The score is 26, since we don't know whether 5 can beat 7 or whether 6 can beat 7. (Well, in a sense we do know, because we defined them as being numbered in order of playing ability, but we can't deduce it from the games actually played.) This graph has an equivalence class of 40320, i.e. every possible way of permuting the rows of the symmetrical adjacency matrix (and identically permuting the columns) results in a matrix with 1s in different places. By contrast, the graph with the smallest n (28), the disconnected K6 + K2, 2DDEFC0, has this symmetry-rich adjacency matrix: 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 Note that any of the 6! = 720 permutation of the middle six rows (together with the identical permutation of the middle six columns) will result in the identical matrix. So will swapping the top and bottom rows and also swapping the leftmost and rightmost columns. So the size of the homeomorphism group is 6! * 2! = 720 * 2 = 1440. And 1440 * 28 is 40320. The fact that all the values of n are divisors of 8! = 40320 is one reason I'm confident of my results, even though they came from a pseudo-random process. Also, I got the same results when running it again with different pseudo-random numbers. Also, my averages for the six 44444444 non-isomorphic graphs agree with Dan's averages in his June 19th email (which addressed only those six graphs). Also, Sloane's A054924 "Triangle read by rows: T(n,k) = number of non-isomorphic unlabeled connected graphs with n nodes and k edges" agrees that I should indeed have exactly 1290 non-isomorphic connected graphs. The list of all 1312 non-isomorphic graphs (i.e. of all equivalence classes) can be found at http://KeithLynch.net/chess.txt They are sorted by degree sequence and within each degree sequence by the "name" of the equivalence class, i.e. the lowest number of any graph in the class, in hexadecimal. The first field is the name, the second is the degree sequence, the third is n, the fourth is the worst score of any graph in the equivalence class, the fifth is the highest score of any graph in the equivalence class, the sixth is a comma- delimited list of how many graphs in the equivalence class had each score from 16 through 28 (the total should add up to n), and the seventh is the average score. For instance: 03FDE78 44444444 n:35 min:16 max:28 1,0,0,2,1,0,4,4,7,3,9,3,1 24.0000 The numbers in the comma-delimited list aren't the same as Dan's, since they're divided by the size of the homeomorphism group. In Dan's lists they always add up, not to n, but to 8! = 40320. The disconnected graphs are the ones whose degree sequence ends with 0, and the one with "max:16." The average score over all 30+ million graphs is about 22.019. The average score over all 1312 non-isomorphic graphs is about 21.929 (not the same since on average larger equivalence classes tend to have slightly higher scores). The average score over all 1290 non-isomorphic connected graphs is about 21.982. The average score over all 22 non-isomorphic disconnected graphs is about 18.832.
participants (1)
-
Keith F. Lynch