Intention was to use Erlangs successor, the mighty potion of youth, for creating alternative movie name titles – ones gone bad
Idea was quickly rendered obsolete though, upon realization that both fetching and processing original titles took about 30 seconds in total with Python, with similar number of lines of code as the processing time elapsed
Phase one: Collect underpants
Learning new language with arbitrary challenges of implementing sorting algorithms, doing simple IO or fetching web data is shiny equivalent of writing hello world, compile, run… Doesn’t give you the right feeling of wielding a new programming asset
So the first part of let’s-have-more-interesting-learning-experience was collecting and normalizing all available movie titles from the web. Before going straight to Elixir, data was scraped with Python as a code reference to build from
List of movies got crawled from wikipedia, year 1900 onward with only “English speaking originated” films targeted. Python’s BeautifulSoup and threading module scraped data in about 20 seconds for all existing titles, ever, rendering implementation of this step trivial in Elixir
Bad words dictionaries, on the other hand, can be retrieved from multiple sources. In this case a bit cleansed version of badwords-list was used, having ones with digits and special characters removed
Second, and frankly, more prominent, task was to go through all words in all movie titles and find permutations and variations close to words from bad dictionary and create new movie titles in lightning way. Elixir is awesome when it comes to CPU utilization, so independent entries going through a custom algorithm to find close characters looked promising
Again, for the POC, a few lines of Python were written, since having pseudo-code or preferably a working algorithm in any familiar language makes moving to a new programming environment much easier
“Problem” arose when naively optimized Py code processed all titles in just a few seconds, even with Global Interpreter Lock, delivering fine results
The Wolf of Balls Street
Implementing all this in Elixir now felt like forcing oneself to watch all episodes of Parks and Recreation only to cover scenes with Ron Swanson – hence the new plan – bootstrap a Karpathy RNN to spin up ideas for real life use cases to learn Elixir on