/ Coding

Brand New Facts

@brand_new_facts is my twitter bot the creates new facts by merging sentences from two different wikipedia pages.

These are a couple of its tweets.

The key tool to make this work is Part of Speech tagging (POS tagging) - the process of marking what function each word is serving in a sentence.

Consider these two sentences:

"Every bag of Nestlé chocolate chips sold in North America has a variation (butter vs. margarine is now a stated option) of her original recipe printed on the back."

(from the Wikipedia entry on chocolate chip cookies)

"A follower believes Satan to be a
supernatural being or force that may be
contacted or supplicated to."

(from an outdated version of the Wikipedia entry on Theistic Satanism)

If we run a sentence through the Python Natural Language Toolkit library's POS tagger, we get:

[('A', 'DT'), ('follower', 'NN'), ('believes', 'VBZ'), ('Satan', 'NNP'), ('to', 'TO'), ('be', 'VB'), ('a', 'DT'), ('supernatural', 'JJ'), ('being', 'VBG'), ('or', 'CC'), ('force', 'VB'), ('that', 'IN'), ('may', 'MD'), ('be', 'VB'), ('contacted', 'VBN'), ('or', 'CC'), ('supplicated', 'VBN'), ('to', 'TO'), ('.', '.')]

The first item in a tuple such as ('Satan', 'NNP') is the original word from the sentence, and the second item is the part of speech tag, where NNP means "Proper noun, singular."

The @brand_new_facts script detects the first verb (VBZ in the example above) in both sentences:

"Every bag of Nestlé chocolate chips sold in North America..."
and
"A follower believes Satan to be a supernatural being or force that individuals may contact and supplicate to."

Now it's simply a matter of take the first sentence up to the verb, and combing it with the second sentence from its first verb onward:

The code is online here: https://github.com/mouse-reeve/fact-join

Enjoy!