Monday, May 30, 2022
HomeData SciencePageant Chatter (Half 4) - Some Easy Sentiment Evaluation

Pageant Chatter (Half 4) – Some Easy Sentiment Evaluation


I believe this submit will most likely conclude my Pageant Chatter collection on analyzing Bonnaroo tweets in Python (half 1, half 2, half 3). I’ve had quite a lot of enjoyable messing round with this dataset, however I believe it’s time to maneuver on to taking part in with one thing else. For this final submit, although, I’ll present some easy sentiment evaluation of the collected tweets. There are a complete bunch of points with this methodology of sentiment evaluation. I’ll point out a few of these after presenting the findings.

Phrases to Numbers

The sentiment evaluation used right here consists of associating a quantity with an English phrase the place the worth of the quantity corresponds to how “constructive” or “damaging” the phrase appears. Presumably, I might undergo the dictionary and subjectively assign numbers to phrases, however fortunately any person else has much less of a life than me (possibly). So, when you click on right here, you may obtain a tab-separated textual content file known as AFINN-111.txt that comprises a dictionary of 2477 phrases and related integer sentiment rating. Bigger constructive (damaging) values correspond to extra constructive (damaging) sentiment. On a complete facet be aware, I typically see variations of my earlier sentence in scientific papers the place one makes an attempt to explain two typically opposing ideas and places the antonyms in parentheses with a purpose to save house. I’m wondering if there’s a identify for this “literary system”? Anyway, good phrases like “love” get a price of +3 on the sentiment rating, whereas hate is price -3.

I wrote a small script calculate_sentiment.py that reads in AFINN-111.txt and calculates the sentiment rating of each tweet. This rating is simply the sum of the sentiment scores for all phrases within the tweet which might be additionally within the dictionary. I initially wrote this script for a homework project from the Coursera Introduction to Knowledge Science course. Issues like separating the scores into “constructive” and “damaging” buckets are pointless now and simply left over from the project.

"""
Calculate tweet sentiment from tweet DataFrame
"""

import pandas as pd
from pandas import DataFrame, Collection


def get_sentiment_dict():
    """ Load in sentiment file AFINN-111 as dictionary """

    sent_file = open('AFINN-111.txt')
    sentiment_dict = {}
    for line in sent_file:
      time period, rating  = line.cut up("t")
      sentiment_dict[term] = int(rating)

    return sentiment_dict

def get_tweet_sentiment(tweet_df):
    """
    Calculate sentiment rating for each tweet in DataFrame tweet_df
    """
    sentiment_dict = get_sentiment_dict()

    apply_fun  =  lambda x: sentiment_count(x, sentiment_dict)
    tweet_sents = tweet_df['tokens'].apply(apply_fun, sentiment_dict)

    return pd.Collection(tweet_sents, identify='text_sentiment')


def sentiment_count(tokens, sentiment_dict):
    """
    Calculate sentiment rating for checklist of "tokens".
    """
    # Initialize
    sent_score = 0.
    word_count = 0.
    sent_buck = {}
    sent_buck['positive'] = 0.
    sent_buck['negative'] = 0.

    for phrase in tokens:
        if sentiment_dict.has_key(phrase):
            if sentiment_dict[word]>0:
                sent_buck['positive'] += float(sentiment_dict[word])
            elif sentiment_dict[word]<0:
                sent_buck['negative'] += float(sentiment_dict[word])
        word_count += 1.

    if word_count == 0:
        sent_score = 0
    else:
        sent_score = (sent_buck['positive']+sent_buck['negative'])

    return sent_score

“However how did it make you are feeling?”

With this equipment, we are able to now check out the Bonnaroo dataset.

import calculate_sentiment

sents = calculate_sentiment.get_tweet_sentiment(bandPop)

print '------------------------'
print '| Tweet sentiment head |'
print '------------------------'
print sents.head()
------------------------
| Tweet sentiment head |
------------------------
created_at
2014-06-11 09:24:57-05:00    3
2014-06-11 09:24:57-05:00    2
2014-06-11 09:25:01-05:00    0
2014-06-11 09:25:05-05:00    0
2014-06-11 09:25:06-05:00    0
Identify: text_sentiment, dtype: float64

We now have a single text_sentiment quantity related to every tweet.

print 'n-------------------------------'
print '| Tweet sentiment description |'
print '-------------------------------'
print sents.describe()
-------------------------------
| Tweet sentiment description |
-------------------------------
rely    66424.000000
imply         0.422844
std          2.344547
min        -20.000000
25%          0.000000
50%          0.000000
75%          1.000000
max         24.000000
Identify: text_sentiment, dtype: float64

We will see that there’s a constructive common sentiment, however the vary is sort of giant (-20, +24). Let’s check out these minimal and most tweets:

bandPop_sents = pd.concat([bandPop, sents], axis=1)
print 'n-----------------------'
print '| Most damaging tweet |'
print '-----------------------'
print bandPop_sents[bandPop_sents['text_sentiment']==bandPop_sents['text_sentiment'].min()]['text'].values[0]
print 'n-----------------------'
print '| Most constructive tweet |'
print '-----------------------'
print bandPop_sents[bandPop_sents['text_sentiment']==bandPop_sents['text_sentiment'].max()]['text'].values[0]
-----------------------
| Most damaging tweet |
-----------------------
Mom fuckers give up sending me your bonnaroo shit. I could not give a fuck about some shit ass music competition

-----------------------
| Most constructive tweet |
-----------------------
I really like strangers. I really like dance events. I  love music. I really like my pals. I really like pizza. I really like tenting. I really like excessive fives. I really like bonnaroo.

As you may see, essentially the most damaging tweet is not any shock. Though our sentiment scoring algorithm captured this tweet’s sentiment pretty nicely, you will note in a bit how the algorithm typically fails as a result of “inflamatory language” is at all times scored so negatively. In the meantime, the frequent use of the phrase “love” within the constructive tweet makes it clear why this tweet ranked so excessive on the sentiment rating.

Temporal Vibes

Now that we’ve our sentiment rating for every tweet, I believed it could be attention-grabbing to take a look at how common sentiment modified with time. For this, I make use of the same strategy to the final submit by resampling the info. I additionally resample at totally different charges which reveals the tradeoff between temporal decision and noise. Lastly, I’ve switched from utilizing prettyplotlib to seaborn. Prettyplotlib was deserted by the creator, and seaborn appears to be fairly common and simple to make use of.

import seaborn as sns
sns.set()

# Time collection tweet sentiment

pal = sns.dark_palette("skyblue", 3, reverse=True)

# Totally different resampling charges
x5 = bandPop_sents['text_sentiment'].resample('5t',how='imply')
x20 = bandPop_sents['text_sentiment'].resample('20t',how='imply')
x60 = bandPop_sents['text_sentiment'].resample('60t',how='imply')

x5.plot(colour=pal[0],lw=3,alpha=.5)
x20.plot(colour=pal[1],lw=3,alpha=.75)
x60.plot(colour=pal[2],lw=1.5)

fig = plt.gcf()
ax = plt.gca()

# Labels
plt.xlabel('Date',fontsize=20)
plt.ylabel('Textual content Sentiment Rating', fontsize=20)
plt.title('Common Tweet Sentiment', fontsize=20)
# Legend
leg = plt.legend(['5 min', '20 min', '60 min'], fontsize=12, title='Resampling Fee')
plt.setp(leg.get_title(),fontsize='15')
# Axes
plt.setp(ax.get_xticklabels(), fontsize=14, household='sans-serif')
plt.setp(ax.get_yticklabels(), fontsize=18, household='sans-serif')
plt.tight_layout()

For the excessive frequency resampling charge (5 min), you may see that the sign will get fairly noisey close to the morning of June thirteenth. It seems that there have been only a few tweets throughout this time, so any single tweet with a sizeable sentiment rating would dominate the 5 minute bin over which I used to be averaging.

By lowering the sampling charge to 60 minute bins, you may see a transparent upturn within the common sentiment beginning round midnight on June fifteenth which was the final night time of the competition. I made a decision to take a look at a few of the very constructive tweets throughout this time.

festival_end = bandPop_sents['2014-6-16':]
for tweet in festival_end[festival_end['text_sentiment']>10]['text'][:6]:
    print tweet + 'n'
If you happen to're a stoner an you like good meals and good music it's essential go to bonnaroo interval lol

I'm so joyful I got here to Bonnaroo.  Had a lot enjoyable and noticed so many wonderful bands.

Excessive Fives and Free Hugs, 90,000 superior folks, over 125+ wonderful performances, 5 phases/tents... #Bonnaroo2014 #Bonnaroo

@KKvspr @Bonnaroo lol it was superior, so trippy! Lol xo hope ur nicely! ;-) n that band superior as nicely.

Soooooo many individuals. Wow I really like my job. I had a tremendous time working Bonnaroo.  Music is so highly effective,… http://t.co/QTxJinI8xs

made it residence, had the very best bathe ever. what wonderful bonnaroo (as regular)! met pretty folks, received a lot of free alcohol, and (duh) the music!

That is fairly cool – it looks like many of those tweets are folks trying fondly again on their time at Bonnaroo and speaking about how a lot they loved it. It will be attention-grabbing to take a look at how lengthy this “after-glow” lasts.

Pageant Chatter Scatter

Lastly, to get again to some perception into the artists, I made a decision to create a scatterplot of the common sentiment for every of the highest 40 hottest artists on the x-axis, and the sentiment customary deviation on the y-axis. The concept was that possibly the usual deviation would point out “controversial” artists.

top_40 = band_hist.index.tolist()[:40]
sent_stats = {}
for band in top_40:
    mn =  bandPop_sents[bandPop_sents[band]==True]['text_sentiment'].imply()
    sd = bandPop_sents[bandPop_sents[band]==True]['text_sentiment'].std()
    sent_stats[band] = [mn, sd]

fig, ax = plt.subplots()

for okay, v in sent_stats.iteritems():
    plt.scatter(v[0],v[1],s=200,alpha=0.5)
    plt.annotate(okay,(v[0],v[1]),xytext=(5,5),textcoords='offset factors')

# Axes
plt.axis([-1.2, 1.7, 1.5, 3.5])
plt.setp(ax.get_xticklabels(),fontsize=18,household='sans-serif')
plt.setp(ax.get_yticklabels(),fontsize=18,household='sans-serif')
# Labels
plt.ylabel('Sentiment Normal Deviation', fontsize=25)
plt.xlabel('Sentiment Imply', fontsize=25)
plt.tight_layout()

Scatter Sentiment

I wasn’t fairly certain what to make of this chart. You’ll be able to see that Damaged Bells is way constructive when it comes to imply sentiment and low on the controversy axis, which might be not stunning. I used to be within the far damaging outlier, A$AP Ferg. I’ve by no means listened to him, so I had no instinct. a few of the extra damaging A$AP Ferg tweets, we once more run into the constraints of sentiment evaluation:

for tweet in bandPop_sents[(bandPop_sents['A$AP Ferg']==True) & (bandPop_sents['text_sentiment']<-5)]['text']:
    print tweet
Holy shit the shock visitors dont cease.  A$AP Ferg simply hit the stage with a canopy of The Infamous B.I.G.'s "Juicy". @Bonnaroo

Zedd's on drums and A$AP Ferg is climbing shit and the @Skrillex #superjam is formally the craziest factor I've seen in awhile #bonnaroo

A$AP FERG KILLED IT! Shit was loopy! #bonnaroo

A$AP Ferg simply pulled essentially the most ridiculous shit of all time at Bonnaroo proper now.

I believe I might need misplaced 5lbs in the course of the A$AP Ferg present in the present day at Bonnaroo. Simply always moshing round. Fucking loopy.

Folks clearly beloved this present, however their language is just not nicely suited to be captured by AFINN-111. And whereas I’m certain that there are a lot of methods to aim to alleviate this difficulty, I’ll fear abou that one other day. For now, all of the scripts used on this collection are on my GitHub.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments