TransWikia.com

GROUPING SETS query with many-to-many relationships

Database Administrators Asked on December 4, 2021

I’m writing a query that is mostly based on a GROUPING SET. A very simplified version of the query is the following:

SELECT GROUPING(a,b) as grouping, a, b  , COUNT(*)
FROM table  
GROUP BY GROUPING SETS ((a,b), ())
ORDER BY count DESC

The real query has more groupings and a bit more stuff around it to translate e.g. one-to-many relations to something understandable. But the core is a bunch of aggregations performed at the same time on that table.

The result of that kind of query are several aggregations at once. And for multi-column grouping sets like in this query, it creates all combinations of those two aggregations.

The problem I have now is how to integrate aggregations that are not based on columns in this table, but on junction tables for many-to-many relationships with this table. For simple aggregations that are only based on a single column, I can do this with a separate query with a simple GROUP BY and UNION ALL that onto the GROUPING SETS query. It’s a bit annoying, but that works.

But the problem are the more complex aggregations based on two or more columns. That’s extremely easy to do with GROUPING SETS, but my workaround for many-to-many relationships doesn’t really work then. I don’t know how I’d write that kind of query if one of the columns of the grouping set is in the main table, and the other one in a many-to-many junction table. I also can’t just join the junction table in this query, as this duplicates rows which means the aggregations are wrong.

Is there a reasonable way of doing this kind of GROUPING SET query with many-to-many relationships, and especially with combinations of columns in the main table and many-to-many relations?

Here is a minimal example of the kind of query I want to do. Keep in mind that the query I post here is obviously wrong because the join duplicates rows which get then counted wrong in the aggregations. The query I want to perform is essentially last one here, just with correct aggregations.

create table if not exists demo (
    id  serial primary key,
    a   text,
    b   text
);

create table if not exists stuff (
    id  serial primary key,
    c   text
);

create table if not exists demo_stuff (
    demo_id int references demo(id),
    stuff_id int references stuff(id)
);

insert into demo values(1,'x','y') on conflict do nothing;
insert into demo values(2,'x','z') on conflict do nothing;
insert into demo values(3,'x','z') on conflict do nothing;

insert into stuff values(1,'foo') on conflict do nothing;
insert into stuff values(2,'bar') on conflict do nothing;

truncate demo_stuff;
insert into demo_stuff values(1,1);
insert into demo_stuff values(1,2);

-- this query only on the demo table works fine
select grouping(a,b),a,b,count(*) 
from demo 
group by grouping sets ((a,b),());

-- this query joins the demo_stuff table to be able to 
-- aggregate on many-to-many relationships for the demo table
select grouping(a,stuff_id),a,stuff_id,count(*) 
from demo 
left join demo_stuff on id = demo_id 
group by grouping sets ((a,stuff_id),());

-- this query shows how the join leads to wrong aggregate 
-- results, as it obviously duplicates rows
-- 'y' appears only once in the table, but is counted twice here
select grouping(b),b,count(*) 
from demo 
left join demo_stuff on id = demo_id 
group by grouping sets ((b),());

One Answer

Your problem query:

-- this query shows how the join leads to wrong aggregate 
-- results, as it obviously duplicates rows
-- 'y' appears only once in the table, but is counted twice here
select grouping(b),b,count(*)
from demo 
left join demo_stuff on id = demo_id 
group by grouping sets ((b),());

Result:

grouping | b    | count
-------: | :--- | ----:
       0 | z    |     2
       0 | y    |     2
       1 | null |     4

Your comment is slightly inaccurate. The "table" is the derived table after joining demo and demo_stuff, and 'y' appears twice there. Either you want count(DISTINCT demo.id) in the result, or count(*). Those counts are different by nature and should go into separate result columns:


SELECT grouping(b), b
     , count(DISTINCT d.id) AS ct_distinct_demos
     , count(*)             AS ct_rows  -- your org count
FROM   demo d
LEFT   JOIN demo_stuff s ON d.id = s.demo_id 
GROUP  BY GROUPING SETS ((b), ())
ORDER  BY 1;

Result:

grouping | b    | ct_distinct_demos | ct_rows
-------: | :--- | ----------------: | ------:
       0 | y    |                 1 |       2
       0 | z    |                 2 |       2
       1 | null |                 3 |       4

db<>fiddle here

You'll have to define precisely what to count for the correct solution. Might result in count(d.b) or count(*) or count (DISTINCT d.b) ...

Answered by Erwin Brandstetter on December 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP