TransWikia.com

How to search for a sub-string in the files which contain another sub-string

Unix & Linux Asked by 25b3nk on December 10, 2021

I wanted to search for a string in a list of files which has another string. Basically, I had to get the list of files with the first string and then do a search on these files for another string.

The following command helped:

grep -ril './' -e "first_string" | xargs grep -i "second_string"

First half of the command gives the list of files containing first_string.

r – Goes recursively into directories

i – String to be searched is case-insensitive

l – List the files with matches

Second half will take these file paths and run the second grep to search if the files has second_string.

Here, we need xargs to take these files and execute the second grep command.

2 Answers

You'd rather want:

grep -rilZ 'first_string' . | xargs -r0 grep -Hi 'second_string'

assuming GNU utilities (which you seem to be having as you're already using the -r GNU extension).

That is:

  • use -Z and xargs -0 to reliably pass the list of paths (which on Unix-like systems can contain any byte value except 0, while xargs without -0 expects a very specific format).
  • use -r for xargs to avoid running the second grep if the first one doesn't find any file (ommiting it here is no big deal, it would just cause the second grep to grep its empty stdin).
  • options should be placed before non-option arguments.
  • we use the -H option for the second grep to make sure the file name is always printed (even if only one file path ends up being passed to it) so we know where the matches are. For grep implementations that don't support -H, an alternative is to add /dev/null to the list of files for grep to look in. Then, grep being passed more than one filename will always print the filename.

Answered by Stéphane Chazelas on December 10, 2021

find . | perl -ne 'open($fh, $_); $s1=0; $s2=0; while($line = <$fh>) { $s1=1 if($line=~/string 1/); $s2=1 if($line=~/string 2/); } ; print $_ if($s1==1 and $s2 ==1); close $fh;' | sort | uniq

(It's a bit long to see, but this goes all on 1 line)

Edit: Some explanation:

  • find . | sends a list of all files in the directory you want to search through (.) to the next command (perl)
  • perl -ne 'COMMANDS' loops through all the lines it receives on STDIN (so all files) and runs COMMANDS on each of them. The name of each file will each time end up in $_
  • open($fh, $_); COMMANDS; close $fh; opens a file, binds it to the filehandle $fh, runs COMMANDS and closes it again.
  • $s1=0; $s2=0; these vars are set to 0 again every time the next file starts (if we find a string in the current file it's set to 1)
  • while($line = <$fh>) { COMMANDS } ; runs COMMANDS on every line in the file.
  • $s1=1 if($line=~/string 1/); $s2=1 if($line=~/string 2/); if string 1 is found in the current file $s1 will become 1, same for $s2
  • print $_ if($s1==1 and $s2 ==1); prints the filename if the strings are found.
  • | sort | uniq sorts the filenames and filters out doubles (this should actually not be necessary)

Answered by Garo on December 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP