TransWikia.com

How to run grep at one file in another avoiding memory exaust issue?

Unix & Linux Asked by Renan41 on December 3, 2021

Have here two large text files, about 30mb each one, which would like to grep them one in another, as grep -f "file01.txt" "file02.txt" > file03.txt.

Doing so returns "memory exaust" error.

How could those files be compared disregarding alphabetic order?

2 Answers

Unless your file01.txt contains actual regular expressions, try:

grep -Ff "file01.txt" "file02.txt" > file03.txt

-F tells grep to treat file01.txt as fixed strings, not regular expressions. This will both greatly increase the speed and greatly reduce the memory requirements.

Regular Expressions

Alternatively, if your file01.txt really does contain regular expressions, you can split it into parts and apply grep to each part separately:

split -dn 10 "file01.txt" ./tmp-file01.
for f in ./tmp-file01.*; do grep -f "$f" "file02.txt"; done >file03.txt

The above splits file01.txt into 10 parts. Depending on your available memory, you may need more than that.

If file01.txt does not have regexes, then use -F in the second line:

for f in ./tmp-file01.*; do grep -Ff "$f" "file02.txt"; done >file03.txt

Answered by John1024 on December 3, 2021

You can't - pattern must be loaded into grep and this exaust memory.

But if you want to compare files, why don't you simply use diff (after sorting the contents)?

For the one-line per pattern (like list of MD5s):

while read md5; do
    grep -w "$md5" file02.txt
done < file01.txt > file03.txt

This of course is much slower, especially with big file02.txt (when it doesn't fit into cache), but works for every size of the pattern file01.txt.

Answered by Yfa Kolh on December 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP