AnswerBun.com

Efficiently copying values from one ndarray to another on unequal sized arrays

I have two arrays of different sizes, but I am trying to overwrite some values within the first array with values from the second array on the matching "keys". My actual problem may have many, many rows, and I have already determined that this is currently bottle-necking my program.

edit: I failed to recognize that there may be duplicate values in a1, which should stay duplicated. I added one such example to the np.array examples.

example:

import numpy as np

# first two columns are 'keys', overwrite the 3rd column in a1 with the 3rd column from a2
# some values may be missing from a2. Those should keep the value in a1

a1 = np.array([[ 0.0,  2.0,  10.0 ],
               [ 0.0,  2.0,  10.0 ],
               [ 0.0,  3.0,  10.0 ],
               [ 1.0,  3.0,  10.0 ],
               [ 1.0, 13.0,  10.0 ],
               [ 2.0,  2.0,  10.0 ],
               [ 2.0,  5.0,  10.0 ]])

a2 = np.array([[ 0.0,  2.0,  0.0   ],
               [ 0.0,  3.0,  0.713 ],
               [ 1.0,  3.0,  0.713 ],
               [ 1.0, 13.0,  1.0   ],
               [ 2.0,  2.0,  0.0   ]])

# wanted result:
np.array([[ 0.0,  2.0,  0.0   ],
          [ 0.0,  2.0,  0.0   ],
          [ 0.0,  3.0,  0.713 ],
          [ 1.0,  3.0,  0.713 ],
          [ 1.0, 13.0,  1.0   ],
          [ 2.0,  2.0,  0.0   ],
          [ 2.0,  5.0,  10.0   ]])

When I do this brute force, I would simply take each row in a2 and loop through each row in a1 to replace values on matches, but is there a way to do this that runs more efficiently? Some way to vectorize the operation on at least one of the loops? My actual case involves many rows in both arrays and this takes a looooong time.

Stack Overflow Asked by DoubleDouble on July 22, 2020

4 Answers

4 Answers

If column three is getting updated and you want to use pandas:

import numpy as np
import pandas as pd

a1 = np.array([[ 0.0,  2.0,  10.0 ],
               [ 0.0,  2.0,  10.0 ],
               [ 0.0,  3.0,  10.0 ],
               [ 1.0,  3.0,  10.0 ],
               [ 1.0, 13.0,  10.0 ],
               [ 2.0,  2.0,  10.0 ],
               [ 2.0,  5.0,  10.0 ]])

a2 = np.array([[ 0.0,  2.0,  0.0   ],
               [ 0.0,  3.0,  0.713 ],
               [ 1.0,  3.0,  0.713 ],
               [ 1.0, 13.0,  1.0   ],
               [ 2.0,  2.0,  0.0   ]])


d1 = pd.DataFrame(a1)

d2 = pd.DataFrame(a2)

d3 = d2.set_index([0,1])[[2]].combine_first(d1.set_index([0,1])[[2]]).reset_index().to_numpy()
d3

Output:

array([[ 0.   ,  2.   ,  0.   ],
       [ 0.   ,  2.   ,  0.   ],
       [ 0.   ,  3.   ,  0.713],
       [ 1.   ,  3.   ,  0.713],
       [ 1.   , 13.   ,  1.   ],
       [ 2.   ,  2.   ,  0.   ],
       [ 2.   ,  5.   , 10.   ]])

Answered by Scott Boston on July 22, 2020

Concatenate a2 and a1 and leave only unique rows for first 2 columns.

a_all = np.r_[a2, a1]
a_all = a_all[np.unique(a_all[:, :2], axis=0, return_index=True)[1]]

Answered by V. Ayrat on July 22, 2020

The solution has 2 parts. First, you need to identify which keys in a1 aren't in a2, and then you need to figure out which row of a2 each row of a1 is associated with.

Here's my solution:

equiv = np.all(np.equal(a1[:,None,:2],a2[None,:,:2]),-1)
mask = np.any(equiv,-1)
ind = np.argmax(equiv,0)

a1[mask,2] = a2[ind,2]

I start by broadcasting both arrays to conforming dimensions and computing the equivalence matrix that tells me for each row of a1 and a2 which are equal for both elements.

Then, it's easy to figure out which rows of a1 are not included in a2 and make a boolean mask from the previous result. We can also find the element number associated with each pair.

Finally, you associate every value of the last column of a1 that has a correspondence in a2 with the associated element in a2.

Answered by asimoneau on July 22, 2020

Would you consider other packages like Pandas?

import pandas as pd

d2 = pd.DataFrame(a2).set_index([0,1])
d1 = pd.DataFrame(a1).set_index([0,1])

d1.update(d2)
d1.reset_index().values

Output:

array([[ 0.   ,  2.   ,  0.   ],
       [ 0.   ,  2.   ,  0.   ],
       [ 0.   ,  3.   ,  0.713],
       [ 1.   ,  3.   ,  0.713],
       [ 1.   , 13.   ,  1.   ],
       [ 2.   ,  2.   ,  0.   ],
       [ 2.   ,  5.   , 10.   ]])

Answered by Quang Hoang on July 22, 2020

Add your own answers!

Related Questions

Output is not shown in C

1  Asked on December 13, 2021 by cally

     

Have to print to screen for program to work

1  Asked on December 13, 2021

 

why ” is not pushed to stack to indicate the end of string?

0  Asked on December 13, 2021 by user11224591

   

Printing wrong text in list in Python

1  Asked on December 13, 2021 by n1ng4

         

Getting and saving application scope variable in Tomcat

3  Asked on December 13, 2021 by woodsman

   

Issue with Ansible Check and Diff Mode

1  Asked on December 13, 2021 by sunder

   

Java HashMap in Python Code throwing Error

1  Asked on December 13, 2021 by pritiraj

     

I am a little bit confused by the super().__init__()

4  Asked on December 13, 2021 by nick-yang

   

Formset not showing in Django 3

0  Asked on December 13, 2021 by ethan-roman

 

Ask a Question

Get help from others!

© 2022 AnswerBun.com. All rights reserved.