Becoming a Pinter, Garner, Pepytatil Nurse, Chief Infospation Officer or Tealhinator, as a career choice, seems to be completely fine in the eyes of a Recurrent Neural Network
The context? Generating new occupations based on existing ones
Wiki fetch
In order to get existing job titles – Wikipedia was consulted
At start, plan was to write a crawler, but the idea was remised in favor of a more “cavalier” code that utilizes MediaWiki API
To retrieve all known professions, URL walk through initial lists of occupations is required. Titles from second level are subsequently fetched, and the special ones (e.g. Category:, Talk:) or references to nested lists are removed
Handling exceptions (i.e. JSONDecodeError, {URL,HTTP}Error) is barbarically neglected
The output? First level salvage returns 14 categories (scientific, healthcare, artistic…), while the second one, final set of all known job titles, barely gets to 865
The RNN
In order to mash things up and model a probability distribution – Recurrent Neural Network implementation from Andrej Karpathy (director of AI @ Tesla) was used
Since we’ve already prepared code to bootstrap Karpathy’s RNN before, next step was merely gluing it to abovementioned Python code for getting input data
Based on the previous experience, Amazon Linux distribution was used (~7-11x faster processing than remaining default EC2 distros), now in the updated flavor of AL2
The ∑
Machine Learning and Neural networks thrive on large input datasets, while we’re dealing with less than 1k records here. Hence, in order to make the most of it, we’re forced to heavily tweak and try various arrangements of hidden layers, dropouts, RNN and batch sizes
On more than a dozen t2.medium instances with Amazon Linux 2 planted, after multiple train setups tested, three configurations yielded the most:
rnn_size | num_layers | batch_size | dropout |
---|---|---|---|
256 | 5 | 1 | 0.5 |
128 | 4 | 2 | 0.5 |
128 | 3 | 2 | 0.6 |
Automated job title generation on a new VM, with last row’s parameters for RNN, can be triggered via:
… producing output of:
In all variations, however, generated titles were mostly random. Roughly 10% is meaningful, which is, having in mind dreadfully terse input, quite expected:
- Runter
- Uthoratist
- Pimicityer
- Newtor
- Phailic Onsinter
- Chestacge Corinner
- Stogh Cliter
- Orertricion Director
- Staeler
- Meddermaderog
- Elergogoot Nursing
- Werhine Pance Thayerist
- Amcrofiin Araryst
- Mashor
- Perecoty Danncer
- Fomiatrilian
- Futhion Suncyror
- Rodugion Mesigner
- Makeniatar
- Chief Buctunel Officer
- Cledus Tanagerer
- Photlac
- Garner
- Pecsion Monicer
- Totrulical Technician
- Ipestralocist
- Foltinist
- Partiol Apant
- Twipo
- Corpossere
- Rushyr
- Sealoet
- Mathor Crale
- Remianecetist
- Pepytatil Nurse
- Dontor Operation
- Dytor Padmer
- Chenm Efficistrator
- Soal Stater
- Tant Daster
- Chief Vart Officer
- Dontore Designer
- Foafer
- Pinter
- Stile Enminderent Aderator
- Gereuttapher
- Aptoroteg
- Bare Telhanicalisirsitc
- Bathuder
- Mirk-tomkar
- Srelhor
- Chief Rimensiol Officer
- Olmhicastor
Still, author’s Linkedin job position is updated with one from the headline – who knows, Tealhinator might become highly demanded position in close future