TransWikia.com

Use Python to Find duplicate values in a feature Class and populate a field

Geographic Information Systems Asked by Jeremy Jones on May 30, 2021

So I found this python script on another post here and was trying to adjust it for my needs. I’m a very novice python user so I’m struggling with how to modify the script. I have a Feature Class stored in a feature Dataset that I want to search a field for duplicate values and populate a new field with Y for duplicate or N for none. The below script is what I found that looks like it will work once I find a way to drill down into my file geodatabase.

        from arcpy import *

        inShapefile = pointsShapefile

        checkField = "xyCombine"
        updateField = "dplicate"

       #List of values found once
        occursOnce = []
       #list of values found twice
        occursTwice = []

     cursor = da.SearchCursor (inShapefile, [checkField])
          for row in cursor:
          #Check value is not null
       if row[0]:
    #If not already found to occur twice, proceed
    if not row[0] in occursTwice:
        #If hasn't occured once yet
        if not row[0] in occursOnce:
            #Add to occurs once list
            occursOnce.append (row[0])
        #If value has already been found once
        else:
            #Add to occurs twice list (duplicates)
            occursTwice.append (row[0])
    del cursor

     cursor = da.UpdateCursor (inShapefile, [checkField, updateField])
     for row in cursor:
      #Check value is not null
        if row[0]:
           #check if value in occursTwice list (i.e. is duplicate)
    if row[0] in occursTwice:
        row[1] = "Y"
    else:
        row[1] = "N"
    cursor.updateRow(row)
   del cursor

3 Answers

Something like this should work:

import arcpy

inShapefile = pointsShapefile
checkField = "xyCombine"
updateField = "dplicate"

with arcpy.da.SearchCursor(inShapefile, [checkField]) as rows:
    values = [r[0] for r in rows]

d = {}
for item in set(values):
    if values.count(item) > 1:
        d[item] = 'Y'
    else:
        d[item] = 'N'

with arcpy.da.UpdateCursor(inShapefile, [checkField, updateField]) as rows:
    for row in rows:
        if row[0] in d:
            row[1] = d[row[0]]
            rows.updateRow(row)

And as @mr.adam suggested, the dictionary is not needed. here is the cleaner version:

import arcpy

def findDupes(inShapefile, checkField, updateField):
    with arcpy.da.SearchCursor(inShapefile, [checkField]) as rows:
        values = [r[0] for r in rows]

    with arcpy.da.UpdateCursor(inShapefile, [checkField, updateField]) as rows:
        for row in rows:
            if values.count(row[0]) > 1:
                row[1] = 'Y'
            else:
                row[1] = 'N'
            rows.updateRow(row)

if __name__ == '__main__':
    fc = r'C:TEMPcrm_test.gdbtest'
    fld = 'Project_Manager'
    up = 'duplicates'

    findDupes(fc, fld, up)

Correct answer by crmackey on May 30, 2021

If you have an Advanced or Info license, another option in Arc is to use the Find Identical tool. This will give you a table of ID rows with matching values. Use the ONLY_DUPLICATES option. Then join the table to the feature class (fc ObjectID to InFID of table), using the KEEP_COMMON keyword for the join type (this is similar to a definition query, in that your feature class will only display matching records).. Then perform a field calculation on the layer. Finally, remove the join so the rest of the features are available.

I don't know how this compares with the da cursor for efficiency. Just another option.

Answered by recurvata on May 30, 2021

I'm providing a more recent solution for finding duplicates and adding the count to a new field. It's straight from ESRI's help document: How to identify duplicate or unique values in ArcGIS Pro.

import arcpy

'''
This script will count the number of occurences of a value in a field ("field_in") and write them to a 
new field ("field_out")
'''

arcpy.env.workspace = r"C:UsersDuplicateTesting.gdb" #path to GDB goes here
infeature = "backup_02232021" #name of feature class goes here
field_in = "location_string_output" #column you're looking for the duplicates in
field_out = "COUNT_"+field_in
arcpy.AddField_management(infeature, field_out,"SHORT")

lista= []
cursor1=arcpy.SearchCursor(infeature)
for row in cursor1:
    i=row.getValue(field_in)
    lista.append(i)
del cursor1, row

cursor2=arcpy.UpdateCursor(infeature)
for row in cursor2:
    i=row.getValue(field_in)
    occ=lista.count(i)
    row.setValue(field_out, occ)
    cursor2.updateRow(row)
del cursor2, row
print("----done----")

Answered by Pfalbaum on May 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP