TransWikia.com

How do we find frequency of one column based off two other columns in SQL?

Stack Overflow Asked by degreecharge on November 29, 2021

I’m relatively new to working with SQL and wasn’t able to find any past threads to solve my question. I have three columns in a table, columns being name, customer, and location. I’d like to add an additional column determining which location is most frequent, based off name and customer (first two columns).

enter image description here

I have included a photo of an example where name-Jane customer-BEC in my created column would be "Texas" as that has 2 occurrences as opposed to one for California. Would there be anyway to implement this?

4 Answers

If you want 'Texas' on all four rows:

select t.Name,  t.Customer, t.Location,
      (select t2.location 
       from table1 t2
       where t2.name = t.name
       group by name, location
       order by count(*) desc
       fetch first 1 row only
      ) as most_frequent_location
from table1 t ;

You can also do this with analytic functions:

select t.Name,  t.Customer, t.Location,
       max(location) keep (dense_rank first order by location_count desc) over (partition by name) most_frequent_location
from (select t.*,
             count(*) over (partition by name, customer, location) as location_count
      from table1 t 
     ) t;

Here is a db<>fiddle.

Both of these version put 'Texas' in all four rows. However, each can be tweaks with minimal effort to put 'California' in the row for ARC.

Answered by Gordon Linoff on November 29, 2021

This is more a question about understanding the concepts of a relational database. If you want that information, you would not put that in an additional column. It is calculated data over multiple columns - why would you store that in the table itself ? It is complex to code and it would also be very expensive for the database (imagine all the rows you have to calculate that value for if someone inserted a million rows) Instead you can do one of the following

  • Calculate it at runtime, as shown in the other answers
  • if you want to make it more persisent, you could embed that query above in a view
  • if you want to physically store the info, you could use a materialized view

Plenty of documentation on those 3 options in the official oracle documentation

Answered by Koen Lostrie on November 29, 2021

Your first step is to construct a query that determines the most frequent location, which is as simple as:

select Name, Customer, Location, count(*)
from table1 
group by Name, Customer, Location

This isn't immediately useful, but the logic can be used in row_number(), which gives you a unique id for each row returned. In the query below, I'm ordering by count(*) in descending order so that the most frequent occurrence has the value 1.

Note that row_number() returns '1' to only one row.

So, now we have

select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1 tb_
group by Name, Customer, Location

The final step puts it all together:

select tab.*, tb_.Location most_freq_location
from table1 tab
inner join 
(select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
 from table1 
 group by Name, Customer, Location) tb_
 on tb_.Name = tab.Name
  and tb_.Customer = tab.Customer
   and freq_name_cust = 1

You can see how it all works in this Fiddle where I deliberately inserted rows with the same frequency for California and Texas for one of the customers for illustration purposes.

Answered by Lars Skaug on November 29, 2021

In Oracle, you can use aggregate function stats_mode() to compute the most occuring value in a group.

Unfortunately it is not implemented as a window function. So one option uses an aggregate subquery, and then a join with the original table:

select t.*, s.top_location
from mytable t
inner join (
    select name, customer, stats_mode(location) top_location
    from mytable
    group by name, customer
) s where s.name = t.name and s.customer = t.customer

You could also use a correlated subquery:

select 
    t.*,
    (
        select stats_mode(t1.location) 
        from mytable t1 
        where t1.name = t.name and t1.customer = t.customer
    ) top_location
from mytable t

Answered by GMB on November 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP