TransWikia.com

Is there a way to delete all RAW files if a corresponding JPG file exists?

Photography Asked by user1074239 on December 21, 2020

I’m starting the process of importing hundreds of thousands of photos into a new DAM system that my company purchased. There are 46,000 RAW files (mostly .cr2).

We don’t need the RAW files anymore. But we don’t want to delete them if there isn’t a corresponding .jpg file.

Is there some way (application, script, etc.) to identify all of the RAW files that have a corresponding .jpg and then delete the RAW files?

That would save probably hundreds of hours of work and free up massive amounts of storage space.

2 Answers

As far as I can tell, there are no good answers in the answer linked above:

  • The accepted answer only checks raw files in the same directory.
  • Those that search the raw file elsewhere will possibly erase the wrong files, because they make the wrong assumption that image file names are unique(*).

To be really safe:

  • If the file time stamps have been preserved you can check the file time stamps, with at least a 2-seconds fuzz margin since CR2 and JPEG have a different stamp and the FAT filesystem on the camera cards only keeps time to 2 seconds of accuracy.
  • Otherwise you would have to check the EXIF data of like-named JPEG and RAW file and see if they match.

I do have a similar script based on file time stamps (it reconciles a CR2 with its JPEG counterpart elsewhere in the directory tree)

Quickly whipped up script using file timestamps, seems to work on my files, use at your own risk:

#! /bin/bash

# Change these to your liking or set them from parameters
jpegDir=/path/to/jpegDir    # Top directory for JPEG
rawDir=/path/to/rawDir      # Top directpry for Raw
timeFuzzSeconds=10          # Max time difference between JPE and raw    

shopt -s extglob
shopt -s globstar

# For all JPEG, find if a like-named raw file exist with a similar timestamp, and delete it
for jpg in $jpegDir/**/*.@(jpg|JPG|jpeg|JPEG)
do
    jpgBase=${jpg##*/}
    rawRootName=${jpgBase%.*}
    jpgTime=$(stat -c "%Y" "$jpg")
    rawMinTime=$((jpgTime-timeFuzzSeconds))
    rawMaxTime=$((jpgTime+timeFuzzSeconds))
#    printf "Searching %s between %s and %sn" $rawRootName "$(date -d @$rawMinTime '+%F %T')" "$(date -d @$rawMaxTime '+%F %T')" 

    # Replace "-print" at the end by "-delete" when you are confident that it works as expected
    # "-print" will only show you the files without touching them
    find $rawDir  -newermt @$rawMinTime  ! -newermt @$rawMaxTime -regextype egrep -regex '.*/'$rawRootName'.(CR2|NEF|DNG)' -print
done

(*) The counter used for names in cameras rolls over at 9999, so the assumption doesn't hold for a large collection from a single camera, and even less so if there are several cameras.

Answered by xenoid on December 21, 2020

Some clarification needed

First of all 3 questions in order to allow for a more complete answer:

  1. What Operating system (Windows, OS X, Linux) will you run these scripts on?
  2. are the JPG files using the same name as the RAW files? Just a different suffix? Or if not, is there another way to link a RAW file to a JPEG file?
  3. Is there a fixed folder structure/hierarchy?

Example Bash Script

You can create a simple script that will iterate over all RAW files and check if there is a JPEG variant and delete if so. Depending on the above answers I can provide you a script for that.

If those files are within the same folder and easily matched it's a very quick and easy script that will execute in minutes. If those files are organized in folders it will require some more extensive find commands that will take a bit more time for the script to execute.

For bash for example this will work if

  • all files are in one directory, put script in that directory
  • .CR2 and .JPEG extension are the only differences between the filesets
for f in *.CR2 ; do [ -e "${f%.CR2}.JPEG" ] && rm "$f"; done

Use with caution cause this will remove files using that rm command at the end!?

Answered by hcpl on December 21, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP