Optimizing relational algebra operations using discrimination-based joins and lazy products

Research output: Working paperResearch

We show how to implement in-memory execution of the core re-
lational algebra operations of projection, selection and cross-product
eciently, using discrimination-based joins and lazy products.
We introduce the notion of (partitioning) discriminator, which par-
titions a list of values according to a specied equivalence relation on
keys the values are associated with. We show how discriminators can
be dened generically, purely functionally, and eciently (worst-case
linear time) on top of the array-based basic multiset discrimination
algorithm of Cai and Paige (1995). Discriminators provide the basis
for discrimination-based joins, a new technique for computing joins
that requires neither hashing nor sorting. Discriminators also provide
ecient implementations for eliminating duplicates, set union and set
We represent a cross-product lazily as a formal pair of the argument
sets (relations). This allows the selection operation to recognize on the

y whenever it is applied to a cross-product, in which case it can choose
an ecient discrimination-based equijoin implementation.
The techniques subsume most of the optimization techniques based
on relational algebra equalities, without need for a query preprocessing
phase. They require no indexes and behave purely functionally.
Full source code in Haskell extended with Generalized Algebraic
Data Types (GADTS) is included. GADTs are used to represent sets
(relations), projections, predicates and equivalence denotations in a
type safe manner. It should be emphasized that the code is only intended for and applicable to operations on
in-memory data; that is, in a random-access memory cost model. Most emphatically, for performance reasons it is, as given, inapplicable to data stored on disk or on the network.
Original languageEnglish
Place of PublicationKøbenhavn
PublisherMuseum Tusculanum
Publication statusPublished - 2009

ID: 12873804