Compare 2 columns in 2 different tables with non uniques values

Question

I want to compare and check whether the values in columnA of table A exist in column A of Table B. But the columnA consist of duplicate values in both the table.

ORA_Dentist · Answer

You can simply get the desired result using LEFT OUTER JOIN as following:

SELECT DISTINCT
    *
FROM
    (
        SELECT
            A.COLUMN_A,
            CASE
                WHEN B.COLUMN_A IS NOT NULL THEN 'YES'
                ELSE 'NO'
            END AS "EXISTS IN TABLE_B"
        FROM
            TABLE_A A
            LEFT JOIN TABLE_B B ON ( A.COLUMN_A = B.COLUMN_A )
    );

Cheers!!

Vérace · Answer

From this answer which I reworked a little and came up with (depending on the OP's exact requirements which aren't entirely clear) the following answer (fiddle available here).

The data is the same (with the addition of a few records). First query run was this:

SELECT 
  UNIQUE columnA AS col_a, 
  COUNT(columnA) AS count_a,
  'A' AS tab
FROM A
GROUP BY columnA
ORDER BY columns;

Result:

COL_A   COUNT_A TAB
    a         6 A   -- extra records (5) in my fiddle, + 1 = 6  
    b         2 A
    c         3 A
    e         1 A

Do the same for table B.

Result:

COL_A   COUNT_A TAB
    b         1   B
    c         2   B
    d         4   B
    e         1   B

Perform a UNION ALL on these two queries. It's more efficient than UNION, i.e. no duplicate elimination required. We know that the records are all different anyway because of the tab "artificial" field but AFAIK, UNION has to perform a check anyway.

Result of UNION ALL - see fiddle for full query:

COL_A   COUNT_A TAB
a   6   A
b   1   B
b   2   A
... results snipped for brevity

Then, using COMMON TABLE EXPRESSIONs (aka CTEs - or the WITH clause), we combine the above result (cte1) with another query cte2 derived from cte1 as follows:

WITH cte1 AS (
SELECT UNIQUE columnA AS col_a, COUNT(columnA) as count_a, 'A' AS tab
FROM A
GROUP BY columnA
-- ORDER BY columnA
UNION ALL
SELECT UNIQUE columnA AS col_a, COUNT(columnA) as count_a, 'B' AS tab
FROM B
GROUP BY columnA
ORDER BY col_a
),
cte2 AS
(
  SELECT col_a, COUNT(col_a) AS cnt
  FROM cte1
  GROUP BY col_a
  HAVING COUNT(col_a) = 1   -- only records with columnA unique to either table 
  ORDER BY col_a
)
SELECT * FROM cte2;

Result:

COL_A   CNT
    a     1  <<-- The CNT of 1 does not refer to the number of records with that
    d     1  <<-- value in the table - it's merely a reflection of the       
             <<-- HAVING COUNT(col_a) = 1: only records with columnA unique to
             <<-- either table and is shown for clarity

So, this gives us a and d which are indeed the values not shared by tableA and tableB but NOT which table they are in.

We then join this back to cte1 as follows:

... Rest of query snipped for brevity - see above (or fiddle)

HAVING COUNT(col_a) = 1   --<<< only records with columnA unique to either table 
  ORDER BY col_a
)
SELECT t1.tab, t1.col_a, t1.count_a, t2.cnt -- cnt not required - equal to 1 anyway.
FROM cte1 t1
JOIN cte2 t2 ON t1.col_a = t2.col_a
ORDER BY tab, col_a;

Which gives the final result:

TAB COL_A   COUNT_A CNT
  A     a         6   1
  B     d         4   1

which tells us that columnA contains the value a 6 times and none in tableB and that value d is in tableB 4 times and not in tableA. Again the CNT field is just to show how the result was derived. If you want all the records and their counts which are shared by both tables, put HAVING... = 2 - see end of fiddle.

The result for this would be:

TAB COL_A   COUNT_A
A   b   2
B   b   1
A   c   3
B   c   2
A   e   1
B   e   1

In future, I would strongly recommend that you use a fiddle to give sample data and then indicate the result that you with to obtain and the logic you used to obtain that result. Since your question isn't very clear, my answer may or may not be "overkill" - if this doesn't respond to your requirements, please let me know. My profile contains some articles on how to ask questions on this forum, you might want to take a look? p.s. welcome to the forum! :-)

stefan · Answer

Maybe this will help:  use analytics to count (the duplicate) values in each table, and use the results of these operations in a full outer join.  Example (see dbfiddle):

Tables

create table A ( columnA )
as
select 'a' from dual union all
select 'b' from dual union all
select 'b' from dual union all
select 'c' from dual union all
select 'c' from dual union all
select 'c' from dual union all 
select 'e' from dual -- one 'e' in each table
;

-- one 'a', 'b', 'c' (respectively) missing here
-- 4 x 'd' (does not exist in the other table)
create table B ( columnA)
as
select 'b' from dual union all
select 'c' from dual union all
select 'c' from dual union all
select 'd' from dual union all
select 'd' from dual union all
select 'd' from dual union all
select 'd' from dual union all
select 'e' from dual -- one 'e' in each table
;

Write to 2 queries, one for each table, that look something like ...

-- count the instances of each value
select unique 
  'table A ->' as tbl, columnA
, count(*) over ( partition by columnA ) as instances_a
from A

-- result
TBL         COLUMNA  INSTANCES_A  
table A ->  a        1            
table A ->  c        3            
table A ->  b        2

If this works, then use the initial queries as "inline views", and write a full outer join, which will allow you to compare the value counts, eg

select 
  *
from (
  select unique 
    'table A ->' as tbl, columnA
  , count(*) over ( partition by columnA ) as instances_a
  from A
) A_
full join
(
  select unique 
    'table B ->' as tbl, columnA
  , count(*) over ( partition by columnA ) as instances_b
  from B
) B_
on A_.columnA = B_.columnA 
order by A_.columnA, B_.columnA
;

-- result
TBL         COLUMNA  INSTANCES_A  TBL         COLUMNA  INSTANCES_B  
table A ->  a        1            NULL        NULL     NULL         
table A ->  b        2            table B ->  b        1            
table A ->  c        3            table B ->  c        2            
NULL        NULL     NULL         table B ->  d        4

You can then modify this query, in order to find out whether values are missing from one of the tables etc.  Eg

select 
  A_.tbl, A_.columnA, count_a
, B_.tbl, B_.columnA, count_b
, case
    when count_a = count_b then 'yes'
    else 'no'
  end as "column contents equal?"
from (
  select unique 
    'table A ->' as tbl, columnA
  , count(*) over ( partition by columnA ) as count_a
  from A
) A_
full join
(
  select unique 
    'table B ->' as tbl, columnA
  , count(*) over ( partition by columnA ) as count_b
  from B
) B_
on A_.columnA = B_.columnA 
order by A_.columnA, B_.columnA
;

TBL         COLUMNA  COUNT_A  TBL         COLUMNA  COUNT_B  column contents equal?  
table A ->  a        1        NULL        NULL     NULL     no                       
table A ->  b        2        table B ->  b        1        no                       
table A ->  c        3        table B ->  c        2        no                       
table A ->  e        1        table B ->  e        1        yes                      
NULL        NULL     NULL     table B ->  d        4        no

(Tested w/ Oracle 12c and 18c)

Compare 2 columns in 2 different tables with non uniques values

3 Answers

Add your own answers!

Ask a Question