TransWikia.com

Is there any NLP library or package which can help in adding comma, punctuation, newlines appropriately to text?

Data Science Asked on March 7, 2021

I have a movie transcript without commas, punctuation, or newlines. Is there any NLP technique that can help to implement this?

One Answer

This can be solved with "text segmentation". NLP libraries have code for breaking given text into :

  • Sentences
  • Phrases
  • Words

With this, you can break text into sentences and insert . or ? for each sentence. Similarly, dependency tree will help with inserting some punctuation marks (not all).

Example (breaking text into sentences):

import spacy
nlp = spacy.load('en_core_web_sm')
text = "I was expecting a surplus of cute close-ups but Burton does surprisingly little to win us over He's never been big on treacle but a bit more warmth in this chilly movie which barely follows the outline of the 1941 original would have gone a long way"
text_sentences = nlp(text)
for sentence in text_sentences.sents:
    print(sentence.text)

Output is :

I was expecting a surplus of cute close-ups but Burton does surprisingly little to win us over

and

He's never been big on treacle but a bit more warmth in this chilly movie which barely follows the outline of the 1941 original would have gone a long way

More details : https://spacy.io/usage/linguistic-features

Answered by Shamit Verma on March 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP