Relational algebra

Content sourced from Wikipedia, licensed under CC BY-SA 3.0.

Relational algebra is a simple theory for modeling data and asking questions about it in relational databases. It treats data as relations (tables) and defines a small set of operators that take one or two input relations and produce a new relation as a result. Because every operation outputs a relation, you can combine them to build complex queries.

Unary operators use a single relation. The two main ones are:
- Selection: keeps only the rows that satisfy a given condition (for example, “age > 18” or “status = true”).
- Projection: keeps only specific columns (for example, just the name and email columns).

Binary operators use two relations. The most common are:
- Union: combines the rows from two relations that have the same columns.
- Difference: keeps only the rows that are in the first relation but not in the second.
- Cartesian product: pairs every row of one relation with every row of the other (usually followed by a selection to keep only meaningful combinations).
- Join: a practical way to combine related rows from two relations by matching on common attributes. The natural join is a common form that automatically uses common column names to match rows.

To use these operators correctly, relations must have compatible schemas. For union and difference, the two relations must have the same set of attributes. For the Cartesian product, the two relations should have disjoint attribute names. Projection returns a set of rows with only the chosen columns, and in practice SQL often returns duplicates unless you request distinct results.

Renaming is another useful operation. It changes the names of attributes, which helps when you want to join two relations that share an attribute name but represent different things.

Relational algebra also covers extended operations that are common in real databases:
- Outer joins (left, right, and full) add unmatched rows and fill missing attributes with a null value.
- Aggregation (like sum, count, average, min, max) groups rows and computes a result for each group, similar to a GROUP BY in SQL.
- These extensions let you do calculations and group data directly in the algebra, though the basic set of operations is more limited.

Some concepts are not expressible with the basic relational algebra alone:
- Transitive closure (computing reachability, for example) isn’t handled by the core operators, though SQL offers some fixpoint-style queries for such tasks.
- Arithmetic and other calculations on columns (like total price = unit_price × quantity) are usually added by extending the algebra or by the query language’s features.

In practice, database systems use the ideas of relational algebra to optimize queries. A query is turned into a plan of relational operations, and the system rearranges and rewrites the plan to minimize work (pushing selections down, reducing large intermediate results, reusing common subexpressions, etc.).

Relational algebra inspired early query languages and still underpins the way modern systems think about queries. SQL, while not exactly the same as the pure algebra, follows its spirit and relies on similar ideas to combine, filter, and summarize data.

This page was last edited on 3 February 2026, at 14:00 (CET).