Brand New Facts
@brand_new_facts is my twitter bot the creates new facts by merging sentences from two different wikipedia pages.
These are a couple of its tweets.
animals can be easily confused with poetic symbolism
— Brand New Facts (@brand_new_facts) October 3, 2015
The 2016 model year Prius Eco produces mucus, but so does every other kind of gastropod
— Brand New Facts (@brand_new_facts) May 27, 2016
The key tool to make this work is Part of Speech tagging (POS tagging) - the process of marking what function each word is serving in a sentence.
Consider these two sentences:
"Every bag of Nestlé chocolate chips sold in North America has a variation (butter vs. margarine is now a stated option) of her original recipe printed on the back."
(from the Wikipedia entry on chocolate chip cookies)
"A follower believes Satan to be a
supernatural being or force that may be
contacted or supplicated to."
(from an outdated version of the Wikipedia entry on Theistic Satanism)
If we run a sentence through the Python Natural Language Toolkit library's POS tagger, we get:
[('A', 'DT'), ('follower', 'NN'), ('believes', 'VBZ'), ('Satan', 'NNP'), ('to', 'TO'), ('be', 'VB'), ('a', 'DT'), ('supernatural', 'JJ'), ('being', 'VBG'), ('or', 'CC'), ('force', 'VB'), ('that', 'IN'), ('may', 'MD'), ('be', 'VB'), ('contacted', 'VBN'), ('or', 'CC'), ('supplicated', 'VBN'), ('to', 'TO'), ('.', '.')]
The first item in a tuple such as ('Satan', 'NNP')
is the original word from the sentence, and the second item is the part of speech tag, where NNP
means "Proper noun, singular."
The @brand_new_facts script detects the first verb (VBZ
in the example above) in both sentences:
"Every bag of Nestlé chocolate chips sold in North America..."
and
"A follower believes Satan to be a supernatural being or force that individuals may contact and supplicate to."
Now it's simply a matter of take the first sentence up to the verb, and combing it with the second sentence from its first verb onward:
Every bag of Nestlé chocolate chips believes Satan to be a supernatural being or force that may be contacted or supplicated to
— Brand New Facts (@brand_new_facts) May 14, 2016
The code is online here: https://github.com/mouse-reeve/fact-join
Enjoy!