100% Austen, notes on method
Somebody asked me to describe my method over the weekend. It’s been a long time since I wrote the text generator, but this is what I recall.
First you gather the model writing into a database. Guternerg.org provided ASCII transcriptions of the complete works of Jane Austen which I strung end to end in a large text file. Let’s call the the corpus. Natural language processing (NLP) software like that used by stylometrists analyze corpii to deduce a model of Jane Austen’s style. This software produces a dictionary of vocabulary and patterns of contiguous word usage.
New sentences were created by remix of Sense & Sensibility. This remix was performed procedurally by advancing through the corpus. Possible sentences are formed by starting with the first word of the novel and combining it with subsequent words until a complete sentence is formed. This new sentence is tested for grammatical correctness. If it passes this test it is submitted to the stylometrics software to test for Austen-ness. If the softwqare reports that it has found a 100% match it enters this new sentence to the new book, the book called 100% Austen.
notes projects writing
Comments are closed
Comments are currently closed on this entry.