Use of the keyword directory () function. It does not perform a correct count with R

Stack Overflow Asked by David Perea on December 19, 2020

I am using the pepa function to extract the paragraphs with the word "Artificial Intelligence" in pdf documents. However, I do not extract all the paragraphs with those words. I missed a lot less. It does not get to extract those from the ends of the document.


dirct <- directory_path
result <- keyword_directory(dirct, keyword = 'Artificial Intelligence', split_pdf = TRUE, surround_lines = 0, full_names = TRUE)

For example, in this file:

I only get 22 mentions, however there are about 40 mentions of this keyword (Artificial Intelligence)

For what is this?

One Answer

You might want to try grepl Example for a dataframe:

data_frame <- read.csv2(...)

data_frame <- mutate(data_frame, columx = 0)

data_frame$columx[grepl("artificial intelligence", data_frame$columx, = TRUE)] <- 1

as indicated by you should also consider intra-word-dashes etc..

When your source file is a PDF, try to create a Corpus (VCorpus) and transform the Corpus to a Document Term Matrix DocumentTermMatrix

Answered by arndtupb on December 19, 2020

Add your own answers!

Related Questions

Package.json with multiple entrypoints

1  Asked on July 29, 2020 by jeanluca-scaljeri


REACT vs REACT_PROJECT vs WEBPACK for storybook type?

1  Asked on July 29, 2020 by temporary_user_name


Random Background Image from Button Click

1  Asked on July 29, 2020 by charmy


Systemd-journald disk wear-out

0  Asked on July 29, 2020 by rohit


Function composition using Go syntax

1  Asked on July 28, 2020 by overexchange


Replacement to getch based code block in cpp

1  Asked on July 24, 2020 by sonu-ishaq


Combining if let statements in Rust

1  Asked on July 24, 2020 by deniz-basgoren


Ask a Question

Get help from others!

© 2022 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir