TransWikia.com

Practical Data-oblivious Compaction?

Theoretical Computer Science Asked on October 30, 2021

Compaction is a particularly weak form of sorting. The problem can be phrased as follows:

Given an array $A$ of $N$ cells, with at most $R$ of the cells distinguished (say by a bit), produce an array $D$ of at most size $O(R)$ containing all of the marked cells

Compaction is said to be:

  • Tight: If the output array $D$ is of size exactly $R$
  • Order-preserving: If the relative ordering of marked elements of $A$ is the same as the relative ordering of elements of $D$ (i.e. if the compaction is "stable" in the sense of sorting)
  • Data-oblivious: If the algorithm which computes the compaction has access patterns independent of the data

One can construct tight, order-preserving, data-oblivious compaction from a sorting network. You have to use non-standard comparison function which:

  • Sets all marked elements as greater than unmarked elements
  • Within comparisons between marked elements (resp. unmarked elements), breaks ties by the element’s initial ordering in the array.

But such a function is easy to specify, and can be computed in $O(log N)$ time.
This means that one can construct tight, order-preserving, data-oblivious compaction from the AKS sorting network (giving an $O(N(log N)^2)$ algorithm) or from a more practical sorting network like Batcher Odd-Even Mergesort (giving an $O(N(log N)^3)$ algorithm).

Anyone who has worked for sorting networks before has heard that there are the asymptotically optimal algorithms (namely the AKS sorting network and ZigZag Sort, both using $O(Nlog N)$ comparisons), and the practically efficient algorithms (Batcher’s mentioned above is one example, but there are many which are $O(N(log N)^2)$ comparisons).

I’m essentially curious if there is something similar for compaction — I have seen papers such as this which give oblivious tight compaction in $O(N)$ time <1>.
This paper’s improvement is lowering the implied constant from $2^{228}$ to ~$9000$. I imagine that for al "reasonably-sized arrays" it is still more efficient to pair Batcher’s sorting network with the aforementioned non-standard comparison function.

So has been research done into practically-efficient compaction algorithms? While I eventually want something which is tight, order-preserving, and data-oblivious, the most important keyword by far is "data-oblivious". I’m willing to tolerate a negligible (meaning $N^{-omega(1)}$) probability of failure, as long as it has "good constants".
Moreover the data to be compacted is "random" in a strong sense. In particular, each element of the array is marked/not marked according to the result of an i.i.d. $mathsf{Bern}(1/2)$.

<1> The linked algorithm has a negligible probability of failure, which is fine for my purposes.

One Answer

It looks like there are some known lower bounds (see page two of Sorting Short Keys in Circuits of Size $o(nlog n)$). The relevant section is copied below:

More specifically, Lin, Shi, and Xie [LSX19] showed that any stable compaction circuit that treats the payload as indivisible must have at least $Omega(n log n)$ selector gates — here the indivisibility assumptions requires that the circuit only move the payload around through selector gates but not perform boolean computation on the encoding of the payload. The indivisibility restriction in Lin et al.’s lower bound [LSX19] can be removed if one assumes the Li-Li network coding conjecture [LL04] — this is implied by Afshani et al. [AFKL19]’s recent work$^2$.

Given that this is the case, it seems like the best thing to do is to use sorting networks. There seem to be a few new ones which may be competitive (see for example this), but I can't say anything definitive.

Answered by Mark on October 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP