TransWikia.com

How postgresql full text search disable file parser?

Database Administrators Asked by jiamo on November 23, 2021

select to_tsvector('english_nostop', 'u,s') @@ to_tsquery('english_nostop', 'u<->s');

return t, while

select to_tsvector('english_nostop', 'u.s') @@ to_tsquery('english_nostop', 'u<->s');

return f

Debug with this:

SELECT alias, token, lexemes FROM ts_debug('english_nostop', 'u.s');
alias | token | lexemes
-------+-------+---------
file  | u.s   | {u.s}

and

audiobook=> SELECT alias, token, lexemes FROM ts_debug('english_nostop', 'u,s');
alias   | token | lexemes
-----------+-------+---------
asciiword | u     | {u}
blank     | ,     |
asciiword | s     | {s}

How can remove such behavior. To make . like ,? The alias file seem to think u.s as special one.

Here is my configure:

CREATE TEXT SEARCH DICTIONARY english_stem_nostop (
Template = snowball
, Language = english
);

CREATE TEXT SEARCH CONFIGURATION public.english_nostop ( COPY = pg_catalog.english );
ALTER TEXT SEARCH CONFIGURATION public.english_nostop
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, hword, hword_part, word WITH english_stem_nostop;
ALTER TEXT SEARCH CONFIGURATION public.english_nostop DROP MAPPING FOR asciihword, hword;

And we can’t simply remove file, this will remove the total text:

ALTER TEXT SEARCH CONFIGURATION public.english_nostop DROP MAPPING FOR file;
SELECT alias, token, lexemes FROM ts_debug('english_nostop', 'u.s');
alias | token | lexemes
-------+-------+---------
file  | u.s   |

Which is wrong too.
And if we ALTER MAPPING for file, we got

SELECT alias, token, lexemes FROM ts_debug('english_nostop', 'u.s');
alias | token | lexemes
-------+-------+---------
file  | u.s   | {u.}

Still not my purpose.

2 Answers

You would have to make your own parser. This is not easy to do. You would also have to decide what features of the parser to keep and what to throw away. It isn't clear to me that the file token type is the only one that will give you problems. There is a test module 'test_parser' which could serve as an example of creating your own, but it much too simple, while the default one is much too complex.

create extension test_parser ;
CREATE TEXT SEARCH CONFIGURATION testcfg ( PARSER = testparser );

A more pragmatic solution might be to just push the text through replace(col_name,'.',' ') before passing it to to_tsvector. If you were using plainto_tsquery, etc., you would also want to pass the query through that, but since you are just using to_tsquery, you would not need to do that.

Answered by jjanes on November 23, 2021

The simple one answer may be :

to_tsvector('english_nostop', array_to_string(string_to_array(title, '.'), ', '))

Answered by jiamo on November 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP