TransWikia.com

Complexity of Set Difference

Theoretical Computer Science Asked by tamalet on October 30, 2021

Given $k$ sets $S_1$, $S_2$, $dots$, $S_k$ in the universe $U = {1, 2, dots, n}$, is there a way to preprocess the $k$ sets such that there is an output-sensitive query algorithm that computes $S_i backslash S_j$ for any $i$ and $j$?

Has this problem been studied before in the literature? If an output-sensitive algorithm (after preprocessing the sets) is not possible, what is the best complexity we can attain?

I found that there is a related problem which focuses on set intersection rather than set difference. To the best of knowledge, there is no output-sensitive algorithm for the case of set intersection.

2 Answers

Yes, let's require even less and say you're just interested in figuring out if the difference (similarly, intersection) is empty or not. It is trivial to have a quadratic-sized data structure with constant time query (by pre-processing everything) and also a linear-sized structure with linear query time (by just storing the sets trivially), and it's natural to conjecture that it's impossible to get the best of both worlds (or even get "very far" from the trivial schemes). People have been looking for lower bounds for this type of problem, the area is known as "static data structures". However, it is recently known that getting sufficiently strong lower bounds for the static data structure required for this type of problem can lead to major (a.k.a. "scary") consequences (lower bounds on matrix rigidity) that, in sufficient strength, are currently considered beyond reach. See this enter link description here

You can slightly make the problem more difficult as follows (so as to make proving lower bounds "easier"): You get all the sets first, and have to pre-process. Then, you get a new set $T$, and can update the data structure. Finally, you get a query $i$ and have to take the difference of $S_i$ and $T$ (or just report if the intersection is empty). For this very important variation (called "dynamic data structure" or specifically "multiphase problem" defined here) proving lower bounds may be within reach (and can lead to lower bounds for all kinds of other data structure problems), but still the state of the art isn't great.

Answered by Mahdi Cheraghchi on October 30, 2021

(Sorry but not enough reputation otherwise this would be a comment.)

Note that set difference is equivalent to set intersection with the complement i.e. $S_ibackslash S_j = S_i cap overline{S_j}$. Thus you could double the number of sets to $S_1, S_2, ...., S_k, overline{S_1}, ..., overline{S_k}$ and apply the set intersection pre-processing to the appropriate pairs.

Answered by Xin Yuan Li on October 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP