TransWikia.com

Powershell script to verify Linux-generated md5sum file

Code Review Asked on November 28, 2021

I’m moving some files from a Linux machine to a Windows machine over a low-speed, possibly buggy experimental communications channel that I want to test. One of the tests is to transfer large numbers of large and small files and verify their cryptographic hashes at the receiving end. On the Linux side, we’re using md5sum to generate file hashes like so:

md5sum * > files.md5

Then the files are transmitted from the Linux machine to the Windows 10 machine. What I’d like to do next is to verify the hashes on the plain-vanilla Windows machine (no Cygwin installed). So to mimic the default operation of md5sum -c files.md5 which would go through, line by line and verify each md5 checksum, I’ve written this Powershell script. I’m a lot more at home in bash than in Powershell, so thought I might benefit from a review.

param (
    [Parameter(Mandatory=$true)][string]$infile
)
$basedir = Split-Path -Parent $infile
$badcount = 0
foreach ($line in [System.IO.File]::ReadLines("$infile")) {
    $sum, $file = $line.split(' ')
    $fullfile = "$basedir$file"
    $filehash = Get-FileHash -Algorithm MD5 $fullfile
    if ($sum -eq $filehash.Hash) {
        Write-Host $file ": OK"
    } else {
        Write-Host $file ": FAILED"
        $badcount++
    }
}
if ($badcount -gt "0") {
    Write-Host "WARNING:" $badcount "computed checksums did NOT match"
}

One Answer

here are a few changes i would make. [grin] the ideas ...

  • use Get-Content instead of ReadLines()
    the speed difference is not large unless you are dealing with a very large number of files. go with the standard cmdlets unless there is a meaningful benefit from doing otherwise.
  • test to see if the file exists
  • build a [PSCustomObject] to hold the resulting items that you want
  • keep those PSCOs in a collection
  • view your hash failure items after the full test ends

what it does ...

  • sets the constants
  • builds a test file to work with
    remove the entire #region/#endregion block when you are ready to use your own data.
  • reads in the hash list file
  • iterates thru the resulting array
  • splits out the file name and hash value
  • builds the full file name to check
  • tests to see if that file exists
  • if YES, gets the file hash and saves it
  • if NO, sets the file hash $Var to '__N/A__'
  • builds a PSCO with the properties that seem useful
  • sends that to the $Result collection
  • gets the hash failures from the collection and displays them
    if all you want it the count, wrap that all in @() and add .Count to the end.

the code ...

$SourceDir = $env:TEMP
$HashFileName = 'FileHashList.txt'
$FullHashFileName = Join-Path -Path $SourceDir -ChildPath $HashFileName

#region >>> make a hash list to compare with
#    remove this entire "#region/#endregion" block when ready to work with your real data
$HashList = Get-ChildItem -LiteralPath $SourceDir -Filter '*.log' -File |
    ForEach-Object {
        '{0} {1}' -f $_.Name, (Get-FileHash -LiteralPath $_.FullName-Algorithm 'MD5').Hash
        }
# munge the 1st two hash values
$HashList[0] = $HashList[0] -replace '.{5}$', '--BAD'
$HashList[1] = $HashList[1] -replace '.{5}$', '--BAD'

$HashList |
    Set-Content -LiteralPath $FullHashFileName
#endregion >>> make a hash list to compare with

$Result = foreach ($Line in (Get-Content -LiteralPath $FullHashFileName))
    {
    $TestFileName, $Hash = $Line.Split(' ')
    $FullTestFileName = Join-Path -Path $SourceDir -ChildPath $TestFileName
    if (Test-Path -LiteralPath $FullTestFileName)
        {
        $THash = (Get-FileHash -LiteralPath $FullTestFileName -Algorithm 'MD5').Hash
        }
        else
        {
        $THash = '__N/A__'
        }
    [PSCustomObject]@{
        FileName = $TestFileName
        CopyOK = $THash -eq $Hash
        OriHash = $Hash
        CopyHash = $THash
        }
    }

$Result.Where({$_.CopyOK -eq $False})

output [with the 1st two hash values deliberately munged] ...

FileName                  CopyOK OriHash                          CopyHash                        
--------                  ------ -------                          --------                        
Genre-List_2020-07-07.log  False 7C0C605EA7561B7020CBDAE24D1--BAD 7C0C605EA7561B7020CBDAE24D140E40
Genre-List_2020-07-14.log  False 20F234ACE66B860821CF8F8BD5E--BAD 20F234ACE66B860821CF8F8BD5EC144D

Answered by Lee_Dailey on November 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP