TransWikia.com

How to determine the number of negation words per sentence

Stack Overflow Asked by user14289862 on January 3, 2021

I would like to know how to count how many negative words (no, not) and abbreviation (n't) there are in a sentence and in the whole text.
For number of sentences I am applying the following one:

df["sent"]=df['text'].str.count('[w][.!?]')

However this gives me the count of sentences in a text. I would need to look per each sentence at the number of negation words and within the whole text.
Can you please give me some tips?

The expected output for text column is shown below

text                                   sent     count_n_s     count_tot
I haven't tried it yet                  1          1              1
I do not like it. What do you think?    2         0.5             1
It's marvellous!!!                      1          0              0
No, I prefer the other one.             2          1              1

count_n_s is given by counting the total number of negotiation words per sentence, then dividing by the number of sentences.

I tried

split_w = re.split("w+",df['text'])

neg_words=['no','not','n't']
words = [w for i,w in enumerate(split_w) if i and (split_w[i-1] in neg_words)]

2 Answers

This would get a count of total negations in the text (not for individual sentences):

import re

NEG = r"""(?:^(?:no|not)$)|n't"""

NEG_RE = re.compile(NEG, re.VERBOSE)

def get_count(text):
    count = 0
    for word in text:
        if NEG_RE .search(word):
            count+=1
            continue
        else:
            pass
    return count
df['text_list'] =  df['text'].apply(lambda x: x.split())
df['count'] = df['text_list'].apply(lambda x: get_count(x))
                       

Correct answer by Darcey BM on January 3, 2021

To get count of negations for individual lines use the code below. For words like haven't you can add it to neg_words since it is not a negation if you strip the word of everything else if it has n't

import re

str1 = '''I haven't tried it yet
I do not like it. What do you think?
It's marvellous!!!
No, I prefer the other one.'''

neg_words=['no','not','n't']
for text in str1.split('n'):
    split_w = re.split("s", text.lower())
    # to get rid of special characters such as comma in 'No,' use the below search
    split_w = [re.search('^w+', w).group(0) for w in split_w]
    words = [w for w in split_w if w in neg_words]
    print(len(words))

Answered by Aaj Kaal on January 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP