Data Science Asked by R Sorek on October 24, 2020
I’m using sklearn Tfifdfvectorizer to extract feature from text towards text classification.
I believe the information I need tends to be in the beginning of the document, so I would like to somehow capture the offset of each feature per document (either of the first appearance, or the mean offset over of all appearances).
Is there some vectorizer that can do that? or some other method of extracting this information efficiently?
Thank you!
One approach is to create another matrix that stores this information. Scikit-learn stores text features in a document-by-token matrix. The cells of this matrix would be the token index in a document. This matrix then could be used as features during modeling.
It would require writing a custom vectorizer which would be similar to scikit-learn's CountVector implementation.
Answered by Brian Spiering on October 24, 2020
1 Asked on August 6, 2020
1 Asked on August 6, 2020
0 Asked on August 6, 2020 by aryan-sethi
classification keras machine learning machine learning model tensorflow
0 Asked on August 6, 2020 by nick-grealy
0 Asked on August 4, 2020
2 Asked on August 4, 2020 by myth
classification deep learning neural network text classification
1 Asked on August 4, 2020 by sectechguy
1 Asked on August 4, 2020 by bluegirl
2 Asked on August 3, 2020 by cvg
1 Asked on August 2, 2020 by sanmelkote
named entity recognition natural language process nlp python spacy
1 Asked on August 2, 2020 by gerardsson
1 Asked on August 1, 2020 by rahs
1 Asked on August 1, 2020 by i_play_with_data
1 Asked on July 31, 2020 by jxn
0 Asked on July 31, 2020 by howard-wang
1 Asked on July 31, 2020 by snorrlaxxx
2 Asked on July 30, 2020 by isu-shrestha
3 Asked on July 30, 2020
1 Asked on July 29, 2020
3 Asked on July 29, 2020 by vinay
Get help from others!
Recent Answers
© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP