Database Administrators Asked by ATMdev on January 2, 2022
In SQL 2017, I have a ‘food’ table as follows:
ID item Ingredients
---- ----- -----------
1 item1 flour,sugar,cocoa,butter
2 item2 flour,sugar,butter,water
3 item3 sugar,cocoa,water
4 item4 sugar,cocoa,butter
5 item5 flour,water
Although the "Ingredients" are a CSV in a single field, I have no problem moving them to a related table:
ID Ingredient
--- ----------
1 flour
1 sugar
1 cocoa
1 butter
2 flour
etc
or just select them with a CROSS APPLY string_split
select ID, item, ingredient
from food
CROSS APPLY string_split(food.ingredients,',') as ing
Now here is what I am trying to attempt:
For any given row ex:(where id = 1)
I want to return the top n rows (other than id = 1
) that have the most common ingredients with id = 1
.
So the top 2 with the most common ingredients with id = 1
should be:
ID item Ingredients
---- ----- -----------
2 item2 flour,sugar,butter,water ( 3 in common)
3 item3 sugar,cocoa,water ( 2 in common)
4 item4 sugar,cocoa,butter ( 3 in common)
5 item5 flour,water ( 1 in common)
The result should return rows with id = 2
and id = 4
as both have 3 ingredients in common with row id = 1
This is my first question on StackExchange, and I hope it is clear.
I can write this as a C# or Python batch program to create a table of ‘closest ingredients’, but I’m hoping I can do this in SQL.
Thanks !
Please store the ingredients in a table not a list. So let's call that ItemIngredient
with two columns ItemId
and Ingredient
and leave it at that. The primary key will be (ItemId, Ingredient)
. Because searches by Ingredient
will probably happen a lot, let's put a non-unique index on Ingredient
(which is necessary for below to work the most efficiently).
SELECT
ItemId
,MatchItemId
,IngredientMatchCount
,MatchRank
FROM
(
SELECT
Base.ItemId
,Match.ItemId AS MatchItemId
,COUNT(*) AS IngredientMatchCount
,RANK() OVER (PARTITION BY Base.ItemId ORDER BY COUNT(*) DESC, Match.ItemId) AS MatchRank
FROM
ItemIngredient Base
INNER JOIN
ItemIngredient Match
ON Match.Ingredient = Base.Ingredient
AND Match.ItemId <> Base.ItemId
-- Put your WHERE clause here if you want to only fine matches for certain Ids
GROUP BY
Base.ItemId
,Match.ItemId
) matched
WHERE
MatchRank <= 1 --Or whatever
ORDER BY
ItemId
You can experiment with RANK
vs DENSE_RANK
at some point to see what provides the best results for you. If you need details about the Item
you can join back in the outer query.
Answered by bbaird on January 2, 2022
0 Asked on September 18, 2020 by vojtch
1 Asked on September 13, 2020 by ajeetkumar
2 Asked on September 11, 2020 by phil-w
1 Asked on September 11, 2020 by dmitriy-grankin
1 Asked on September 10, 2020 by juclart-ngouedi
1 Asked on September 8, 2020 by l-lijith
0 Asked on September 5, 2020 by thomas-r
3 Asked on September 2, 2020 by jay-gupta
1 Asked on September 2, 2020 by irakli-ugulava
1 Asked on September 2, 2020 by vojtch-dohnal
0 Asked on August 26, 2020 by mhweb
0 Asked on August 24, 2020
0 Asked on August 23, 2020 by ivor
1 Asked on August 21, 2020 by franco
3 Asked on August 18, 2020 by deepak-garg
10 Asked on August 14, 2020 by geoaxis
1 Asked on August 14, 2020 by steveg
1 Asked on August 13, 2020 by gio
availability groups sql server sql server 2012 tempdb tempdb version store
Get help from others!
Recent Questions
Recent Answers
© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP