Analysis of 100 billion tweets provides new insights into linguistic patterns — ScienceDaily

An investigation of Twitter messages reveals new insights and tools for studying how people use stretched words, such as “duuuuude,” “heyyyyy,” or “noooooooo.” Tyler Gray and colleagues at the University of Vermont in Burlington present these findings in the open-access journal PLOS ONE on May 27, 2020.

In spoken and written language, stretched words can modify the meaning of a word. For instance, “suuuuure” can imply sarcasm, while “yeeessss” may indicate excitement. Stretched words are rare in formal writing, but the rise of social media has opened up new opportunities to study them.

Gray and colleagues have now completed the most comprehensive study to date of “stretchable” words in social media. They developed a new, more thorough strategy for identifying stretched words in tweets and used it to analyze a randomly selected dataset of about 10 percent of all tweets generated between September 2008 and December 2016 — totaling about 100 billion tweets.

The researchers identified thousands of “stretchable” words in the tweets, including “ha” (e.g., “hahaha” or “haaahaha”), “awesome” (e.g., “awesssssommmmmeeeeee”) and “goal) (e.g., ggggoooooaaaaallllll).

They also identified two key ways of measuring the characteristics of stretchable words: balance and stretch. Balance refers to the degree to which different letters tend to be repeated. For instance, “ha” has a high degree of balance because when it is stretched, the “h” and the “a” tend to be repeated just about equally. “Goal” is less balanced, with “o” repeated more than any other letter in the word.

Stretch refers to how long a word tends to be stretched. For instance, short words or sounds like “ha” have a high degree of stretch because people often repeat them many times (e.g., “hahahahahahahaha”). Meanwhile, regular words like “infinity” have lower stretch, often with just one letter repeated: “infinityyyy.”

For this analysis, the researchers developed various tools and methods that could be used in future research of stretchable words, such as investigations of mis-typings and misspellings. The tools could also be applied to improve natural language processing, search engines, and spam filters

The authors add: “We were able to comprehensively collect and count stretched words like ‘gooooooaaaalll’ and ‘hahahaha’, and map them across the two dimensions of overall stretchiness and balance of stretch, while developing new tools that will also aid in their continued linguistic study, and in other areas, such as language processing, augmenting dictionaries, improving search engines, analyzing the construction of sequences, and more.”

Story Source:

Materials provided by PLOS. Note: Content may be edited for style and length.

Source link

Written by sortiwa

The inside track on Newcastle United’s Saudi suitors – Football Weekly Extra | Football

AI reveals mechanism for kin selection in a wild primate — ScienceDaily