TransWikia.com

Split a long string into full sentences in R

Stack Overflow Asked by JeniFav on November 12, 2021

I have string data I’ve pulled from the internet. I want to parse it into it’s full sentences.

So, for example:

library(RXKCD)
library(stringr)

searchXKCD("health")

getXKCD(574)
tweets <- getXKCD(574)

tweets$transcript  # This is the string I want to parse.

cols <- str_extract_all(tweets$transcript, "[A-Za-z]+") # I know how to pull out the words separated, but that's not what I want to do.

# just because
freq <- table(cols)

plot(freq)

Ultimately, I want to end up with:

enter image description here

One Answer

This is just a case of parsing the string and cutting it into the appropriate segments:

strsplit(strsplit(tweets$transcript, "(\}\})|(\{\{)")[[1]][3], "n")[[1]][-1]
#> [1] "SKEEVE37: Oh God I ate pork yesterday before I knew about swine flu!"                                                                 
#> [2] "HANNELOREEC: Without duct tape I can't seal the door to keep out swine flu but I can't get duct tape without going outside! Help!"    
#> [3] "PAULYSHOREFAN: How long until the swine flu reaches me here in Madagascar?"                                                           
#> [4] "CRACKMONKEY74: Swine flu is God's punishment for the ACLU and lesbians and 9"                                                         
#> [5] "11 and nanobots!"                                                                                                                     
#> [6] "TWILIGHT7531: I fell down the stairs and there was a crack and a jagged white thing is sticking out of my arm guys is this swine flu?"
#> [7] "WIGU: @UNTOWARD: No, that sounds like syphilis, not swine flu. What did you say you did with a pig?"                                  
#> [8] "2011SENIORSRULE: My Dad said flu vaccines are linked to autism, so to be safe from swine flu I'm trying to lick an autistic kid."  

Answered by Allan Cameron on November 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP