We Need More Science: How to Teach Heartless Computers to Really Get What We’re Feeling

The leaderboard for Opposite Worlds. Screenshot via SyFy

For its flagship new reality show Opposite Worlds the Syfy channel wanted to let the audience “remote control” the show via social media. I worked with Syfy to create what ultimately became its real-time “Twitter Popularity Index.”

The Index combines the intensity of conversation around each character, the number of unique discussants, and the emotion of that discussion using a new sentiment engine powered by over 1.6 million words, phrases and common misspellings and colloquial expressions. Using our Index, Opposite Worlds records across the board in Twitter engagement for a cable television series.

Sentiment mining is a hot emerging field, yet the underlying technology has changed little from the first computerized sentiment mining system created in 1961, the General Inquirer. It still treats emotion measurement as merely a technical problem. This has yielded a stream of pioneering technical achievements that have focused on algorithms rather than the actual outcome of how to better measure tone online.

Kalev Leetaru

Kalev Leetaru is the Yahoo! Fellow in Residence of International Values, Communications Technology & the Global Internet at Georgetown University. His work centers on the application of high performance computing and “big data” to grand challenge problems.

To build the first sentiment engine that could actually understand real-time tweets, we had to start from scratch, asking the question: how can big data combine with human insight to change the way we interact with our world? In the process, we identified 16 limitations to current sentiment mining approaches. Here’s how we got around them:

Letter Expansions: Social media has popularized the use of repeated letters within words. Instead of saying “I love SyFy” it is common to see “I loooooooooooove SyFy” or “I looooooovvvvvveeeeeeee SyFy.” Current systems compile a list of the most common expansions, such as “looove” or “cooool,” but this misses the vast majority of expressions. Instead, our system collapses down each word so that any letter expansion of any word matches into the dictionary properly.

Misspellings: One of the hallmarks of social media posts are that they are authored quickly, often from mobile devices with small keyboards, and are rarely spell checked. Typographical errors abound on Twitter and few algorithms attempt to correct for them other than to use a list of the most frequent errors. Our system uses a set of algorithms based on models of human typing to encode most conceivable misspellings of each word in the tonal dictionary, resulting in a final database of over 1.6 million entries that capture the majority of recognizable misspellings.

Hashtags: Emotion is increasingly expressed through hashtags such as “john #ilovehim” or even just “#ilovejohn.” Current sentiment systems handle hashtags by compiling a list of the most common hashtags and assigning a tonal score to each. This catches “#ilovehim” but misses the less common “#ilovejohn.” To accurately score hashtags they need to be unpacked and expanded into the sequence of words they encode, an area of natural language processing known as “compound word expansion.” We created an optimized algorithm based on Twitter language use that does this in real time.

Phrases: Current sentiment systems largely assign tonal scores to single words, but many words rely heavily on context to assess their emotional context. Our system supports matching both words and phrases up to four words in length allowing it to recognize “ace up his sleeve” or “go break a leg” as having a positive connotation.

Many common phrases like “thumbs up,” “red faced,” or “jumped the shark” have no surface emotional meaning (a thumb pointing up, a flushed face, hopping over an aquatic animal), but have widespread emotional connotation.

Colloquial Expressions: Day-to-day speech revolves heavily around colloquial expressions that ultimately find their way into social media. Few systems today can distinguish between “his life is on the line” and “he is on the level with us” or “go to hell” and “hell yeah.” At the same time, many common phrases like “thumbs up,” “red faced,” or “jumped the shark” have no surface emotional meaning (a thumb pointing up, a flushed face, hopping over an aquatic animal), but have widespread emotional connotation. We assembled a significant dictionary of the most common phrases.

Alternative Social Usage: Social media often uses words in very different ways from formal speech that changes their emotional connotation. The phrases “too hot,” “a killer” or “blowing up” would likely be used negatively in formal writing. Online, however, these phrases appear as “he is way too hot for words,” “she has a killer smile,” or “the song has been blowing up the charts.”

Entity Separation: Tweets can sometimes mention multiple characters, such as “I like X, hate Y, and love love Z.” Many systems will just score the sentence as a whole and assign it to all three characters, but ours can pick out the three different characters and assign tone respectively.

Vocabulary: While the word “obsequious” is quite rare on Twitter, it does show up several times a day and is missed by many systems. Surprisingly, even words like “unhappy,” “saddened,” and “miserable,” which do make quite an appearance on Twitter, are absent from many systems. To create the tone dictionaries we used, a human scorer examined the entire English dictionary four times in randomized order and tagged each word, and then went back over words appearing semi-frequently in books, news, and social media, and scored them again. Finally, this complete list was then scored based on context to catch words whose connotation varies by its surroundings. While no list could ever exhaustively chronicle the emotional connotations of the entire English language, the underlying dictionaries here are among the most extensive ever created.

While no list could ever exhaustively chronicle the emotional connotations of the entire English language, the underlying dictionaries here are among the most extensive ever created.

Creative Social-Only Words: Social media has led to the creation of many new words not found in any dictionary, particularly when it comes to profanity and vulgar terms, but also positive words like “awesomenessly.”

Extensive Conjugations: Many tonal words are verbs and can have multiple, potentially irregular, conjugations. Tonal dictionaries usually record only the root word and a few common conjugations, which misses many mentions. For our system, each tonal word was cross-referenced against all accepted and common alternative conjugations to ensure maximal coverage.

Model Expansion: Many words and phrases do not exist outside of social media. A series of automated models were used to automatically construct a collection of different tonal dictionaries using a diverse array of starting points, which were then merged and manually reviewed in a sequence of passes to identify novel social media uses and connotations.

Negation: Surprisingly, few systems understand the concept of negation (“not liked” versus “liked”) and those that do use a small dictionary of “negation terms” like “don’t” and “not” and simply invert the tone of the following word. Instead, our system actually assigns a preliminary score to each word and then reads the entire tweet in reading order to understand the flow of the text, examining sequences of positive and negative words such that “a spectacular waste of time” is coded correctly, while the colloquial engine, described above, ensures that it also correctly codes “he is a horrible flirt.”

Booster Words: Some words don’t have their own emotional connotation, but they do boost that of the following word. Saying you “really love” a show means more than you “loved” it, while a “spectacular screwup” is likely worse than a mere “screwup.” Once again, the tweet is processed in reading order to understand its underlying narrative flow, with emotions boosted as needed.

Robust Grammar: A number of recent sentiment systems have incorporated part of speech tagging and complex grammatical parsing. However, they require pristine English text that is correctly spelled and grammatically perfect, with a single misplaced comma, misspelled word, or inverted clause wreaking havoc. One system correctly codes “I hate john, but I love kate,” but fails when presented with “I hate john and I love kate,” making this approach unsuitable for the informal speech of social media.

Language Evolution: Many systems still use tonal dictionaries built decades ago when “cool” meant cold and emoticons had yet to be invented. Language is in a state of constant change and tonal dictionaries must be constantly updated and calibrated to today’s connotations.

Human Oversight: Automated generation of tonal dictionaries is increasingly popular, using large batches of tweets containing and emoticons as training data for machine learning algorithms that can achieve 70-percent accuracy out of the box. However, they are often right for the wrong reasons. One system scores “dumb” as positive and “so” as very negative meaning that “you so dumb” is coded as negative, but “you are dumb” is coded positively and “you are so beautiful” is coded negatively. Person names like “Michael” and “Emily” frequently have predefined emotional scores. The words “Monday,” “treadmills,” “airports,” “economists,” “hospitals,” “orthodontists,” “doctors” and “dentists” are all highly negative in many tools, which makes sense until “Monday night football” receives a strongly negative score. You don’t want to bias the very thing you are measuring. For our system, we applied such algorithms to expand our dictionaries, but then manually reviewed their entire output.

These 16 insights reflect one of the deeper truths of big data that is largely absent from today’s breathless marketing hype: that the underlying algorithms that power big data analysis have largely been built by computer scientists emphasizing technological prowess over a deeper understanding of how complex human behavior really is.

To create the “Opposite Worlds” sentiment mining system, we made extensive use of highly sophisticated language models, filtering tools, natural language processing algorithms, and machine learning systems. Yet, these computer science approaches were combined with laborious human review, manual compilation of tens of thousands of terms, and a deep background in human emotion. We brought with us no preconceived notions of how emotion “should” be expressed online, focusing instead on what we learned from pouring over how Twitter is actually being used today. The increasing use of hashtags to express emotion, the heavy reliance on emotionally laden colloquial expressions, and the critical importance of tolerance towards typographical errors required combining automated computer-assisted compilation with extensive manual labor. In the end, this marriage of human and machine yielded what we believe is one of the most sophisticated sentiment analysis systems ever built for social media.

Whether you are measuring views towards a television show, or charting changing consumer views towards a brand of running shoes, the technology-centric approach of current sentiment systems will greatly impede the accuracy and comprehensiveness of your results. By blending human and machine, the system that ultimately powered “Opposite Worlds” uncovered a deeper need within the sentiment analysis field to engage more closely with disciplinary scholars and to focus less on algorithms and more on outcomes. For big data to mature beyond marketing hype towards truly transformative solutions, it must “grow up” out of the computer science labs that gave birth to it and spend more time on understanding the domain-specific algorithms and data it is applied to than on the computing algorithms that operationalize them.

“Opposite Worlds” represented the first time sentiment technology had been used as an integral part of a television series to actually impact events on the screen in realtime. This combination of social media and sentiment mining places a show’s fans in the driver’s seat for the first time since the creation of television over three quarters of a century ago.

Editor: Emily_Dreyfuss@wired.com

We Need More Science

How to Teach Heartless Computers to Really Get What We’re Feeling

No comments:

Post a Comment