飞码网-免费源码博客分享网站

点击这里给我发消息

Python中的情感分析:TextBlob vs Vader vs Sentiment vs Flair vs 自定义模型

飞码网-免费源码博客分享网站 爱上飞码网—https://www.codefrees.com— 飞码网-matlab-python-C++ 爱上飞码网—https://www.codefrees.com— 飞码网-免费源码博客分享网站
情感分析是最广为人知的自然语言处理(NLP)任务之一。本文旨在让读者非常清楚地了解情感分析以及在NLP中通过不同的方法来实现,所以让我们深入了解一下。

NLP领域在过去的五年里有了非常大的发展,像Spacy、TextBlob等开源软件包为NLP提供了现成的功能,比如情感分析。这些免费提供的包太多了,让你很迷茫,不知道该为自己的应用使用哪一个。

在本文中,我将讨论最流行的NLP情感分析包。
1.Textblob
2.VADER
3.Flair
4.自定义模型
最后,我还会在一个共同的数据集中比较它们各自的性能。

什么是情感分析?

情感分析是确定自然语言中给定表达方式的情感价值的任务。

它本质上是一个多类文本分类文本,其中给定的输入文本被分为积极、中性或消极的情感。类的数量可以根据训练数据集的性质而变化。

例如,有时它被表述为一个二元分类问题,1为积极情感,0为消极情感标签。

情感分析的应用

情感分析在各个领域都有应用,包括分析用户评论、微博情感等。这里我们就来介绍一些。

电影评论。分析在线电影评论,获取观众对电影的见解。
新闻情绪分析:分析某机构的新闻情绪,以获得洞察。
在线食品评论:从用户反馈中分析食品评论的情感。

在python中进行情感分析 

在python中,有很多包可以使用不同的方法来做情感分析。在下一节中,我们将介绍一些最流行的方法和包。

基于规则的情感分析

基于规则的情感分析是计算文本情感的非常基本的方法之一。它只需要最少的前期工作,思路也很简单,这种方法不需要使用任何机器学习来计算文本情感。例如,我们可以通过计算用户在他/她的推文中使用 "悲伤 "一词的次数来计算出一句话的情感。

现在,我们来看看一些使用这种方法的python包。

Textblob 

它是一个简单的python库,提供API访问不同的NLP任务,如情感分析、拼写校正等。

Textblob情感分析器为一个给定的输入句子返回两个属性。

极性是介于[-1,1]之间的一个浮点数,-1表示负面情绪,+1表示正面情绪。
主观性也是一个浮点数,它位于[0,1]的范围内。主观句一般指的是个人意见、情感或判断。

我们来看看如何使用Textblob。

from textblob import TextBlob

testimonial = TextBlob("The food was great!")
print(testimonial.sentiment)

 Sentiment(polarity=1.0, subjectivity=0.75)

Textblob会忽略它不认识的单词,它会考虑它能赋予极性的单词和短语,并进行平均,得到最终的分数。

VADER情感

情感推理的价位感知词典(VADER)是另一种流行的基于规则的情感分析器。

它使用词汇特征列表(如单词),这些特征根据其语义方向被标记为积极或消极,以计算文本情感。  

Vader sentiment返回给定输入句子的概率是 

正面、负面、中性。

例如:

"食物是很好吃!"
正面:99%

负面:1%
中性:0%

这三个概率加起来会是100%。

我们来看看如何使用VADER。

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer


analyzer = SentimentIntensityAnalyzer()
sentence = "The food was great!" 
vs = analyzer.polarity_scores(sentence)
print("{:-<65} {}".format(sentence, str(vs)))

{'compound': 0.6588, 'neg': 0.0, 'neu': 0.406, 'pos': 0.594}

Vader针对社交媒体数据进行了优化,当与twitter、facebook等数据一起使用时,可以取得很好的效果。

基于规则的情感分析方法的主要缺点是,该方法只关心单个词,而完全忽略了它的使用环境。

例如,"当事人很野蛮 "在任何基于token的算法考虑下都会是负面的。

基于嵌入的模型

文本嵌入是NLP中的一种词语表示形式,其中同义词相似的词语使用相似的向量来表示,当这些向量在n维空间中表示时,它们会相互接近。


基于嵌入的python包使用这种形式的文本表示来预测文本情感。这导致NLP中更好的文本表示,并产生更好的模型性能。

Flair就是这样一个软件包。   

Flair 

Flair是一个简单易用的框架,用于最先进的NLP。

它提供了各种功能,如

1.预先训练的情感分析模型。
2.文本嵌入。
3.NER: 
以及更多。
让我们来看看如何使用flair非常简单高效地进行情感分析。

Flair预训练的情感分析模型是在IMDB数据集上训练的。要加载并使用它进行预测,只需做。

from flair.models import TextClassifier
from flair.data import Sentence

classifier = TextClassifier.load('en-sentiment')
sentence = Sentence('The food was great!')
classifier.predict(sentence)

# print sentence with predicted labels
print('Sentence above is: ', sentence.labels)

[POSITIVE (0.9961)

如果你喜欢为你的领域定制一个情感分析器,可以使用flair使用你的数据集训练一个分类器。
使用flair预训练模型进行情感分析的缺点是,它是在IMDB数据上训练的,这个模型可能不能很好地泛化其他领域的数据,比如微博。

从零开始建立情感分析模型 

在本节中,您将学习何时以及如何使用TensorFlow从头开始建立一个情感分析模型。那么,让我们来看看如何做到这一点。

为什么要定制模型?

首先让我们了解什么时候你会需要一个自定义的情感分析模型。例如,你有一个应用需求,比如分析航空评论的情绪。

通过建立一个自定义模型,你也可以得到更多的输出控制。

TFhub 

TensorFlow Hub是一个训练好的机器学习模型库,可以随时随地进行微调和部署。

对于我们的目的,我们将使用通用句子编码器,将文本编码为高维向量。你也可以使用任何你喜欢的文本表示模型,如GloVe,fasttext,word2vec等。

模型

由于我们使用通用句子编码器对输入文本进行矢量化,我们不需要在模型中加入嵌入层。如果你打算使用任何其他的嵌入模型,比如GloVe。在这里,我将只为我们的目的建立一个简单的模型。

数据集

对于我们的例子,我将使用Kaggle的twitter情感分析数据集。这个数据集包含140万条有标签的推文。

你可以从这里下载该数据集。

https://www.codefrees.com/ymxz/2020-10-19/689.html

 

实例。用Python进行Twitter情感分析

1.导入重要数据包

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense,Input
from sklearn.utils import shuffle
import tensorflow_hub as hub
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import numpy as np
import os
import pandas as pd
DATASET_ENCODING = "ISO-8859-1"

2.下载并处理Twitter情感数据集

!kaggle datasets download -d kazanova/sentiment140
sentiment140.zip: Skipping, found more recently modified local copy (use --force to force download)
df = pd.read_csv("/content/sentiment140.zip",encoding=DATASET_ENCODING)
df= df.iloc[:,[0,-1]]
df.columns = ['sentiment','tweet']
df = pd.concat([df.query("sentiment==0").sample(20000),df.query("sentiment==4").sample(20000)])
df.sentiment = df.sentiment.map({0:0,4:1})
df =  shuffle(df).reset_index(drop=True)

df,df_test = train_test_split(df,test_size=0.2)
df.head(5)

sentiment    tweet
29039    1    @tommcfly *------* u wrote a song in brazil, w...
6839    0    @amazingphoebe it is until you actually start ...
36081    1    @SeasonedWTime better have a strong cup of cof...
15445    0    @MNMorgan I want to go! But I have to work all...
22537    1    had a fantastic weekend

3.从TFhub加载通用句型编码器。

embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
embed(['hi samuels, this is our project']).numpy().shape
WARNING:tensorflow:5 out of the last 8 calls to <function recreate_function.<locals>.restored_function_body at 0x7fd115ab5950> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:5 out of the last 8 calls to <function recreate_function.<locals>.restored_function_body at 0x7fd115ab5950> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
(1, 512)
def vectorize(df):
    embeded_tweets = embed(df['tweet'].values.tolist()).numpy()
    targets = df.sentiment.values
    return embeded_tweets,targets

embeded_tweets,targets = vectorize(df)

4.模型

model = Sequential()
model.add(Input(shape=(512,),dtype='float32'))
model.add(Dense(128, activation = 'relu'))
model.add(Dense(64, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))

model.compile(loss='binary_crossentropy', 
              optimizer='adam',
              metrics=['acc'])
model.summary()
Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_3 (Dense) (None, 128) 65664 _________________________________________________________________ dense_4 (Dense) (None, 64) 8256 _________________________________________________________________ dense_5 (Dense) (None, 1) 65 ================================================================= Total params: 73,985 Trainable params: 73,985 Non-trainable params: 0 _________________________________________________________________

5.培训和评估

num_epochs = 10
batch_size = 32   ## 2^x

history = model.fit(embeded_tweets, 
                    targets, 
                    epochs=num_epochs, 
                    validation_split=0.1, 
                    shuffle=True,
                    batch_size=batch_size)
Epoch 1/10 900/900 [==============================] - 1s 1ms/step - loss: 0.4821 - acc: 0.7686 - val_loss: 0.4384 - val_acc: 0.7891 Epoch 2/10 900/900 [==============================] - 1s 1ms/step - loss: 0.4450 - acc: 0.7911 - val_loss: 0.4305 - val_acc: 0.7962 Epoch 3/10 900/900 [==============================] - 1s 1ms/step - loss: 0.4168 - acc: 0.8107 - val_loss: 0.4326 - val_acc: 0.7928 Epoch 4/10 900/900 [==============================] - 1s 1ms/step - loss: 0.3798 - acc: 0.8312 - val_loss: 0.4525 - val_acc: 0.7837 Epoch 5/10 900/900 [==============================] - 1s 1ms/step - loss: 0.3381 - acc: 0.8537 - val_loss: 0.4688 - val_acc: 0.7803 Epoch 6/10 900/900 [==============================] - 1s 1ms/step - loss: 0.2948 - acc: 0.8771 - val_loss: 0.5120 - val_acc: 0.7725 Epoch 7/10 900/900 [==============================] - 1s 1ms/step - loss: 0.2500 - acc: 0.8988 - val_loss: 0.6094 - val_acc: 0.7638 Epoch 8/10 900/900 [==============================] - 1s 1ms/step - loss: 0.2105 - acc: 0.9161 - val_loss: 0.6598 - val_acc: 0.7594 Epoch 9/10 900/900 [==============================] - 1s 1ms/step - loss: 0.1699 - acc: 0.9331 - val_loss: 0.8014 - val_acc: 0.7563 Epoch 10/10 900/900 [==============================] - 1s 1ms/step - loss: 0.1378 - acc: 0.9467 - val_loss: 0.8478 - val_acc: 0.7556


plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy') 
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
<matplotlib.legend.Legend at 0x7fd12d5e91d0>

6.比较结果

from sklearn.metrics import accuracy_score

7.自定义模型

embed_test,targets = vectorize(df_test)
predictions = model.predict(embed_test).astype(int)
accuracy_score(predictions,targets)*100
49.925000000000004

8.Textblob

!pip install -q textblob 
from textblob import TextBlob

def text_sentiment(text):
    testimonial = TextBlob(text)
    return int(testimonial.sentiment.polarity>0.5)

predictions = df_test.tweet.map(lambda x :  text_sentiment(x))
accuracy_score(predictions,targets)
analyzer.polarity_scores("The food as great!")
{'compound': 0.6588, 'neg': 0.0, 'neu': 0.406, 'pos': 0.594}

9.Vader

pip install -q vaderSentiment
|████████████████████████████████| 133kB 9.5MB/s
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

def text_sentiment_vader(text):
 vs = analyzer.polarity_scores(text)
 return int(vs.get("compound")>0)
 
 predictions = df_test.tweet.map(lambda x : text_sentiment_vader(x))
 accuracy_score(predictions.values,targets)

10.Fiair

!pip install -q flair 
Collecting flair
  Downloading https://files.pythonhosted.org/packages/cd/19/902d1691c1963ab8c9a9578abc2d65c63aa1ecf4f8200143b5ef91ace6f5/flair-0.6.1-py3-none-any.whl (331kB)
     |████████████████████████████████| 337kB 2.8MB/s 
Requirement already satisfied: gensim>=3.4.0 in /usr/local/lib/python3.6/dist-packages (from flair) (3.6.0)
Collecting transformers>=3.0.0
  Downloading https://files.pythonhosted.org/packages/19/22/aff234f4a841f8999e68a7a94bdd4b60b4cebcfeca5d67d61cd08c9179de/transformers-3.3.1-py3-none-any.whl (1.1MB)
     |████████████████████████████████| 1.1MB 13.0MB/s 
Collecting segtok>=1.5.7
  Downloading https://files.pythonhosted.org/packages/41/08/582dab5f4b1d5ca23bc6927b4bb977c8ff7f3a87a3b98844ef833e2f5623/segtok-1.5.10.tar.gz
Requirement already satisfied: tqdm>=4.26.0 in /usr/local/lib/python3.6/dist-packages (from flair) (4.41.1)
Requirement already satisfied: torch>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from flair) (1.6.0+cu101)
Requirement already satisfied: scikit-learn>=0.21.3 in /usr/local/lib/python3.6/dist-packages (from flair) (0.22.2.post1)
Requirement already satisfied: matplotlib>=2.2.3 in /usr/local/lib/python3.6/dist-packages (from flair) (3.2.2)
Collecting sqlitedict>=1.6.0
  Downloading https://files.pythonhosted.org/packages/5c/2d/b1d99e9ad157dd7de9cd0d36a8a5876b13b55e4b75f7498bc96035fb4e96/sqlitedict-1.7.0.tar.gz
Requirement already satisfied: tabulate in /usr/local/lib/python3.6/dist-packages (from flair) (0.8.7)
Collecting janome
  Downloading https://files.pythonhosted.org/packages/a8/63/98858cbead27df7536c7e300c169da0999e9704d02220dc6700b804eeff0/Janome-0.4.1-py2.py3-none-any.whl (19.7MB)
     |████████████████████████████████| 19.7MB 1.2MB/s 
Collecting pytest>=5.3.2
  Downloading https://files.pythonhosted.org/packages/d6/36/9e022b76a3ac440e1d750c64fa6152469f988efe0c568b945e396e2693b5/pytest-6.1.1-py3-none-any.whl (272kB)
     |████████████████████████████████| 276kB 48.2MB/s 
Collecting sentencepiece!=0.1.92
  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
     |████████████████████████████████| 1.1MB 47.7MB/s 
Collecting deprecated>=1.2.4
  Downloading https://files.pythonhosted.org/packages/76/a1/05d7f62f956d77b23a640efc650f80ce24483aa2f85a09c03fb64f49e879/Deprecated-1.2.10-py2.py3-none-any.whl
Requirement already satisfied: gdown in /usr/local/lib/python3.6/dist-packages (from flair) (3.6.4)
Collecting mpld3==0.3
  Downloading https://files.pythonhosted.org/packages/91/95/a52d3a83d0a29ba0d6898f6727e9858fe7a43f6c2ce81a5fe7e05f0f4912/mpld3-0.3.tar.gz (788kB)
     |████████████████████████████████| 798kB 41.8MB/s 
Collecting ftfy
  Downloading https://files.pythonhosted.org/packages/ff/e2/3b51c53dffb1e52d9210ebc01f1fb9f2f6eba9b3201fa971fd3946643c71/ftfy-5.8.tar.gz (64kB)
     |████████████████████████████████| 71kB 9.1MB/s 
Requirement already satisfied: regex in /usr/local/lib/python3.6/dist-packages (from flair) (2019.12.20)
Requirement already satisfied: lxml in /usr/local/lib/python3.6/dist-packages (from flair) (4.2.6)
Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.6/dist-packages (from flair) (2.8.1)
Collecting langdetect
  Downloading https://files.pythonhosted.org/packages/56/a3/8407c1e62d5980188b4acc45ef3d94b933d14a2ebc9ef3505f22cf772570/langdetect-1.0.8.tar.gz (981kB)
     |████████████████████████████████| 983kB 43.8MB/s 
Requirement already satisfied: hyperopt>=0.1.1 in /usr/local/lib/python3.6/dist-packages (from flair) (0.1.2)
Collecting konoha<5.0.0,>=4.0.0
  Downloading https://files.pythonhosted.org/packages/ea/01/47358efec5396fc80f98273c42cbdfe7aab056252b07884ffcc0f118978f/konoha-4.6.2-py3-none-any.whl
Collecting bpemb>=0.3.2
  Downloading https://files.pythonhosted.org/packages/91/77/3f0f53856e86af32b1d3c86652815277f7b5f880002584eb30db115b6df5/bpemb-0.3.2-py3-none-any.whl
Requirement already satisfied: six>=1.5.0 in /usr/local/lib/python3.6/dist-packages (from gensim>=3.4.0->flair) (1.15.0)
Requirement already satisfied: numpy>=1.11.3 in /usr/local/lib/python3.6/dist-packages (from gensim>=3.4.0->flair) (1.18.5)
Requirement already satisfied: smart-open>=1.2.1 in /usr/local/lib/python3.6/dist-packages (from gensim>=3.4.0->flair) (2.2.0)
Requirement already satisfied: scipy>=0.18.1 in /usr/local/lib/python3.6/dist-packages (from gensim>=3.4.0->flair) (1.4.1)
Collecting tokenizers==0.8.1.rc2
  Downloading https://files.pythonhosted.org/packages/80/83/8b9fccb9e48eeb575ee19179e2bdde0ee9a1904f97de5f02d19016b8804f/tokenizers-0.8.1rc2-cp36-cp36m-manylinux1_x86_64.whl (3.0MB)
     |████████████████████████████████| 3.0MB 46.2MB/s 
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from transformers>=3.0.0->flair) (20.4)
Collecting sacremoses
  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
     |████████████████████████████████| 890kB 39.0MB/s 
Requirement already satisfied: dataclasses; python_version < "3.7" in /usr/local/lib/python3.6/dist-packages (from transformers>=3.0.0->flair) (0.7)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers>=3.0.0->flair) (2.23.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers>=3.0.0->flair) (3.0.12)
Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from torch>=1.1.0->flair) (0.16.0)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.6/dist-packages (from scikit-learn>=0.21.3->flair) (0.16.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=2.2.3->flair) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=2.2.3->flair) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=2.2.3->flair) (0.10.0)
Collecting pluggy<1.0,>=0.12
  Downloading https://files.pythonhosted.org/packages/a0/28/85c7aa31b80d150b772fbe4a229487bc6644da9ccb7e427dd8cc60cb8a62/pluggy-0.13.1-py2.py3-none-any.whl
Requirement already satisfied: importlib-metadata>=0.12; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from pytest>=5.3.2->flair) (2.0.0)
Requirement already satisfied: iniconfig in /usr/local/lib/python3.6/dist-packages (from pytest>=5.3.2->flair) (1.0.1)
Requirement already satisfied: toml in /usr/local/lib/python3.6/dist-packages (from pytest>=5.3.2->flair) (0.10.1)
Requirement already satisfied: py>=1.8.2 in /usr/local/lib/python3.6/dist-packages (from pytest>=5.3.2->flair) (1.9.0)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.6/dist-packages (from pytest>=5.3.2->flair) (20.2.0)
Requirement already satisfied: wrapt<2,>=1.10 in /usr/local/lib/python3.6/dist-packages (from deprecated>=1.2.4->flair) (1.12.1)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.6/dist-packages (from ftfy->flair) (0.2.5)
Requirement already satisfied: networkx in /usr/local/lib/python3.6/dist-packages (from hyperopt>=0.1.1->flair) (2.5)
Requirement already satisfied: pymongo in /usr/local/lib/python3.6/dist-packages (from hyperopt>=0.1.1->flair) (3.11.0)
Collecting overrides==3.0.0
  Downloading https://files.pythonhosted.org/packages/42/8d/caa729f809ecdf8e76fac3c1ff7d3f0b72c398c9dd8a6919927a30a873b3/overrides-3.0.0.tar.gz
Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers>=3.0.0->flair) (7.1.2)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers>=3.0.0->flair) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers>=3.0.0->flair) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers>=3.0.0->flair) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers>=3.0.0->flair) (2020.6.20)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata>=0.12; python_version < "3.8"->pytest>=5.3.2->flair) (3.2.0)
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.6/dist-packages (from networkx->hyperopt>=0.1.1->flair) (4.4.2)
Building wheels for collected packages: segtok, sqlitedict, mpld3, ftfy, langdetect, sacremoses, overrides
  Building wheel for segtok (setup.py) ... done
  Created wheel for segtok: filename=segtok-1.5.10-cp36-none-any.whl size=25021 sha256=cf47a3857790931cc2e5d624ea7bf233b3524bcabb924d133dbf76a2403dc921
  Stored in directory: /root/.cache/pip/wheels/b4/39/f6/9ca1c5cabde964d728023b5751c3a206a5c8cc40252321fb6b
  Building wheel for sqlitedict (setup.py) ... done
  Created wheel for sqlitedict: filename=sqlitedict-1.7.0-cp36-none-any.whl size=14377 sha256=145c5230dc506b01f398b0d14568379c037aa1e5d9e8cd27300d09d554c1afa5
  Stored in directory: /root/.cache/pip/wheels/cf/c6/4f/2c64a43f041415eb8b8740bd80e15e92f0d46c5e464d8e4b9b
  Building wheel for mpld3 (setup.py) ... done
  Created wheel for mpld3: filename=mpld3-0.3-cp36-none-any.whl size=116677 sha256=522b05f7e39120663238f0d26a5290533fb797073cdb778924bf515c2163d2a4
  Stored in directory: /root/.cache/pip/wheels/c0/47/fb/8a64f89aecfe0059830479308ad42d62e898a3e3cefdf6ba28
  Building wheel for ftfy (setup.py) ... done
  Created wheel for ftfy: filename=ftfy-5.8-cp36-none-any.whl size=45612 sha256=f090b605f421b0836e5415dcf0967e4f8922526daa89d22e419f0c9b8ba4c7a3
  Stored in directory: /root/.cache/pip/wheels/ba/c0/ef/f28c4da5ac84a4e06ac256ca9182fc34fa57fefffdbc68425b
  Building wheel for langdetect (setup.py) ... done
  Created wheel for langdetect: filename=langdetect-1.0.8-cp36-none-any.whl size=993195 sha256=2d33dcde18fa49174c793112aad2aa6df0262210394f3acf14b03c62aeb960de
  Stored in directory: /root/.cache/pip/wheels/8d/b3/aa/6d99de9f3841d7d3d40a60ea06e6d669e8e5012e6c8b947a57
  Building wheel for sacremoses (setup.py) ... done
  Created wheel for sacremoses: filename=sacremoses-0.0.43-cp36-none-any.whl size=893257 sha256=858d3538d4c20d3ca9c991a2d0b5357897770ba255ff905120ce8509b3c9abc9
  Stored in directory: /root/.cache/pip/wheels/29/3c/fd/7ce5c3f0666dab31a50123635e6fb5e19ceb42ce38d4e58f45
  Building wheel for overrides (setup.py) ... done
  Created wheel for overrides: filename=overrides-3.0.0-cp36-none-any.whl size=5669 sha256=df0d9138d2bd03fd5e23b80aaba93063fb40aa1bfe65ae8444f80bf523f20275
  Stored in directory: /root/.cache/pip/wheels/6f/1b/ec/6c71a1eb823df7f850d956b2d8c50a6d49c191e1063d73b9be
Successfully built segtok sqlitedict mpld3 ftfy langdetect sacremoses overrides
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.
Installing collected packages: tokenizers, sacremoses, sentencepiece, transformers, segtok, sqlitedict, janome, pluggy, pytest, deprecated, mpld3, ftfy, langdetect, overrides, konoha, bpemb, flair
  Found existing installation: pluggy 0.7.1
    Uninstalling pluggy-0.7.1:
      Successfully uninstalled pluggy-0.7.1
  Found existing installation: pytest 3.6.4
    Uninstalling pytest-3.6.4:
      Successfully uninstalled pytest-3.6.4
Successfully installed bpemb-0.3.2 deprecated-1.2.10 flair-0.6.1 ftfy-5.8 janome-0.4.1 konoha-4.6.2 langdetect-1.0.8 mpld3-0.3 overrides-3.0.0 pluggy-0.13.1 pytest-6.1.1 sacremoses-0.0.43 segtok-1.5.10 sentencepiece-0.1.91 sqlitedict-1.7.0 tokenizers-0.8.1rc2 transformers-3.3.1
from flair.models import TextClassifier
from flair.data import Sentence
classifier = TextClassifier.load('en-sentiment')

def text_sentiment_flair(text):
  sentence = Sentence(text)
  classifier.predict(sentence)
  return np.round(sentence.labels[0].score)

predictions = df_test.tweet.map(lambda x : text_sentiment_flair(x))
accuracy_score(predictions.values,targets)
0.503625
sentence=Sentence("The food was great!")
classifier.predict(sentence)
sentence.labels
[POSITIVE (0.9961)]

我已经实现了我们上面讨论的所有算法。

比较结果

现在,我们来比较一下以上算法的结果
Algorithm Accuracy
Textblob 56%
VADER 56%
Flair 50%
USE model 0.775

你可以看到,我们的自定义模型没有任何超参数调整,产生了最好的结果。

最后的想法

在这篇文章中,我讨论了情感分析和在python中实现它的不同方法。
我还比较了它们在一个普通数据集上的性能。
希望你会发现它们对你的一些项目有用。
飞码网-免费源码博客分享网站 爱上飞码网—https://www.codefrees.com— 飞码网-matlab-python-C++ 爱上飞码网—https://www.codefrees.com— 飞码网-免费源码博客分享网站
赞 ()

相关推荐

内容页底部广告位3
留言与评论(共有 0 条评论)
   
验证码: