Mousetext

It's ok to be confused

Brand New Facts

@brand_new_facts is my twitter bot the creates new facts by merging sentences from two different wikipedia pages.

These are a couple of its tweets.

The key tool to make this work is Part of Speech tagging (POS tagging) - the process of marking what function each word is serving in a sentence.

Consider these two sentences:

"Every bag of Nestlé chocolate chips sold in North America has a variation (butter vs. margarine is now a stated option) of her original recipe printed on the back."

(from the Wikipedia entry on chocolate chip cookies)

"Satan is a supernatural being or force that individuals may contact and supplicate to."

(from an outdated version of the Wikipedia entry on Theistic Satanism)

If we run a sentence through the Python Natural Language Toolkit library's POS tagger, we get:

[('Satan', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), 
 ('supernatural', 'JJ'), ('being', 'VBG'), ('or', 'CC'), 
 ('force', 'VB'), ('that', 'IN'), ('individuals', 'NNS'), 
 ('may', 'MD'), ('contact', 'VB'), ('and', 'CC'), 
 ('supplicate', 'VB'), ('to', 'TO'), ('.', '.')]

The first item in the tuple ('Satan', 'NNP') is the original word from the sentence, and the second item is the part of speech tag, where NNP means "Proper noun, singular."

The @brand_new_facts script detects the first verb (VBZ in the example above) in both sentences:

"Every bag of Nestlé chocolate chips sold in North America..."
and
"Satan is a supernatural being or force that individuals may contact and supplicate to."

Now it's simply a matter of take the first sentence up to the verb, and combing it with the second sentence from its first verb onward:

The code is online here: https://github.com/mouse-reeve/fact-join

Enjoy!

Mouse Reeve

Mouse is a software engineer at the Internet Archive, a historical occultism, and a true believer in nonsense. Nonsense is important.

San Francisco @tripofmice