I was able to steal away some time to work on Word Chain because I finished the July milestone for Train a couple days early. I prioritized getting the back end to generate word chains so that I can actually enjoy the game. Up until now I hand wrote every word chain so I was easily able to complete all the chains.
I think I was completely successful. I wrote two python scripts and use two json scripts to generate the chains. The first python script, wordPairInput.py, allows easy input of lots of word pairs. The pairs are stored in wordPairs.json. The data can be modelled as a directed graph, possibly cyclic. The chains will be paths in the graph of a constant length. The pairs are stored in a dictionary of lists of strings.
Here’s what an excerpt looks like:
"quitting": [ "time" ], "day": [ "break", "off", "time" ],
The key in the dictionary is the first word in the pair. The list contains all possible second words. This is like an adjacency list. It allows for very quick lookup of the second words for a given first word including quick lookup of whether a full pair has already been entered.
The second python script, generateChains.py uses wordPairs.json to generate a list of chains in wordLists.json. I thought I would need a graph processing library to generate the chains in a reasonable time but I decided to try a very naive approach to start. I run through all the possible first words, then for each word, recursively try to add on all the possible second words from its entry in the pair dictionary. It ends the recursion once the chain has 6 letters in it.
This process is probably best defined in code. partialChain is a list of strings representing the chain so far. reList is list of finished chains and pair dictionary is the dictionary described by wordPairs.json
def appendWordsToChain(partialChain, retList, pairDictionary): lastWord = partialChain[-1] if lastWord not in pairDictionary: return nextWords = pairDictionary[lastWord] for nextWord in nextWords: if nextWord in partialChain: continue newChain = copy.copy(partialChain) newChain.append(nextWord) if len(newChain) < chainLength: appendWordsToChain(newChain,retList,shortList,pairDictionary) else: retList.append(newChain)
I was able to quickly get the basics of this algorithm down and only ran into a few snags. In my first iteration I mistakenly had “return” instead of “continue” after the check for repeat words in the partial chain. This prevented further options in a partial chain from being tried. I also had to refresh myself on how python works with passing by value/reference. Turns out it passes by object reference which barely makes sense to me. To replicate C style passing by value I used copy.copy. There may have been a more pythonic way of doing this but I’m used to C style recursion algorithms so I just replicated it here.
This algorithm worked surprisingly well. It is able to generate about 900 chains now nearly instantly on my laptop. As far as I can tell, I don’t think I’ll need to do any more complex graph processing unless it slows down greatly as wordPairs.json gains more and more entries.
Once I had it working I aimed to the chain generator output information that would let me easily add more chains. Currently, I have it store every failed chain in a list. After generating all the chains it prints all the failed chains that were one word short of being completed.
['muzzle', 'flash', 'back', 'space', 'station'] ['muzzle', 'flash', 'back', 'seat', 'cushion'] ['muzzle', 'flash', 'back', 'pack', 'mule'] ['part', 'time', 'share', 'crop', 'circles'] ['money', 'back', 'side', 'burn', 'notice'] ['money', 'back', 'side', 'kick', 'stand'] ['now', 'boarding', 'house', 'fire', 'works'] ['chock', 'full', 'time', 'sink', 'hole'] ['chock', 'full', 'house', 'fire', 'works'] ['quitting', 'time', 'share', 'crop', 'circles'] ['body', 'bag', 'lady', 'bug', 'zapper'] ['quarter', 'note', 'book', 'case', 'worker']
This makes it very easy to look over the printed list, find a word at the start or end of the list that I think I could find a pair for. For example I could look at the output above and try to come up with a pair ending in muzzle because I know it will create at least 3 more chains. Achieving this required also computing a reversed pair dictionary. This was to check whether or not a failed list could be extended from the front. I may end up serializing the reversed pair dictionary while entering pairs as well.
With just these simple scripts and the output of the one word short failures I was able to go from 6 chains to over 900 in about a day.
Next I’d like to also print out any word pairs that are not connected to any others. I could then try to come up with pairs to connect those islands to the other pairs. I’d also like to output some general analysis data such as the number of pairs, the average number of second words for a first word, the average number of first words for a second word (using the reversed dictionary), the number of single pair islands and the longest nonrepeating chain.
Doing some some sort of auomated cross checking with a known Engish dictionary to check for typos would also be valuable. I just noticed that I entered “done dry” instead of “bone dry”.
I’d also love to be able to visualize the graph of the pairs. Being able to see it visually seems like it would be really really cool. 😀