TransWikia.com

Comm/Diff (or alternative) that respects glob characters: is this possible?

Unix & Linux Asked on October 31, 2021

Restricting "globbing" grammar to, specifically, suffix-globbing, * characters only, suppose I have:

// foo.txt
foo.*
biz

// bar.txt
bar
foo.bar

And I would like to write:

diff <(sort -u foo) <(sort -u bar)

# alternatively
comm -3 <(sort -u foo) <(sort -u bar)

Such that it returns:

# diff
> bar
< biz

# comm
bar 
biz

Is this possible natively?

One Answer

If you want to report all the lines of bar.txt that match none of the patterns in foo.txt, in zsh, that could be:

unique_lines=(${(fu)"$(<bar.txt)"})
unique_patterns=(${(fu)"$(<foo.txt)"})
pattern="(${(j[|])unique_patterns})"

print -rC1 -- ${unique_lines:#$~pattern}

Or all in one go:

print -rC1 -- ${${(fu)"$(<bar.txt)"}:#(${(j[|])~${(fu)"$(<foo.txt)"}})}
  • $(<file) is the ksh-like operator that expands to contents of $file striped of trailing newline characters.
  • ${(flags)param} uses parameter expansion flags to affect the param expansion.
  • f flag, splits on linefeeds (here expands to the list of non-empty lines).
  • u (uniq): removes duplicates. So ${(fu)"$(<foo.txt)"} expands to the unique non-empty lines of foo.txt
  • ${array:#pattern} expands to the elements of $array that don't match the pattern. Here the pattern is constructed as:
  • ${(j[|])unique_patterns} where the elements of $unique_patterns are joined with |. So we end up with a (line1|line2|...) pattern.
  • The ~ in $~pattern causes wildcards to be considered as such upon the variable expansion.

Note that wildcard syntax is that of zsh wildcards. That's affected by a few shell options like extendedglob, kshglob, nocasematch...

In bash, you could do something similar with:

shopt -s extglob
pattern="@($(sort -u foo.txt | paste  -sd '|' -))"
sort -u bar.txt |
  while IFS= read -r line; do
    [[ $line = $pattern ]] || printf '%sn' "$line"
  done

This time, the syntax is that of bash extglob wildcards, similar to that of ksh88 ones.

Note that the order of the lines in bar.txt ends up being changed.

Answered by Stéphane Chazelas on October 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP