TransWikia.com

Compare strings with different encodings

Unix & Linux Asked by DK999 on February 25, 2021

I’m trying to compare a string from a file that is encoded in UTF8

file /dev/eeprom: UTF-8 Unicode text, with very long lines

with a serial number that is hardcoded into the script.
When printing to the console, the string appears just fine but it seems there is a problem with the format of the file (iconv isn’t available though).
The script is an ASCII text executable if file output is correct.

#!/bin/sh
eeprom_id=$(cat /dev/eeprom | grep -e ID: | awk '{split($0,a,":"); print a[2]}')
echo "EEPROM_ID: $eeprom_id"

if [ $eeprom_id == "C000139-102" ]
then
    echo "String identical"
else
    echo "WRONG"
fi

Output:

.script.sh
EEPROM_ID: C000139-102
WRONG

Any ideas how to compare those strings properly?

One Answer

It should be possible to do this entirely in awk:

awk -F':' -v ref_id="C000139-102" '$1=="ID" {if ($2==ref_id) print "Identical"; else print "WRONG"}' /dev/eeprom

To read out the ID into a shell variable, as in your example script:

eeprom_id=$(awk -F':' '$1=="ID" {print $2}')

If, as @user414777 suspects, you are dealing with UTF-16-encoded file, you may have to use

cat /dev/eeprom | tr -d '' | awk -F':' -v ref_id="C000139-102" '$1=="ID" {if ($2==ref_id) print "Identical"; else print "WRONG"}'

or try

awk -F':' -v ref_id="C000139-102" '{gsub(/x00/,""); if ($1=="ID") {if ($2==ref_id) print "Identical"; else print "WRONG"}}' /dev/eeprom

Again, to read the ID into a shell variable:

eeprom_id=$(cat /dev/eeprom | tr -d '' | awk -F':' '$1=="ID" {print $2}')

or

eeprom_id=$(awk -F':' '{gsub(/x00/,""); if ($1=="ID") print $2}' /dev/eeprom)

Correct answer by AdminBee on February 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP