Set Theory and SQL Concepts

I got caught out the other day with a sensible question of representing a sql join in a set-theoretic fashion. I fumbled it, mixing two joins. I needed to go back and review! But, as I went away and thought about it, I also remembered that there are some serious assumptions in running this sort of conceptual argument that need to be explicitly stated or chaos ensues and cats and dogs will be living together. There are also some things that are not true about the sql representation in a set-theoretic manner, but rather fall into relational algebra. That said, there are advantages to be gained by understanding the assumptions and then using it as a simple conceptual tool to understand what you are trying to achieve.

Set theory

So, lets start with the set theory first, and then follow it up with the SQL, and then show the relationship. Let us identify two non-empty sets, A and B. For this part, we will define A = {1,2,3} and B = {2,3,4}.

Set Definition

There are a number of binary operations (like arithmetical +, -, etc.) that we can perform on the sets.

Firstly, we can get subset out of the way. If all the members of A are also found in B, then A is a subset of B. denoted A ⊆ B. Trivially, A ⊆ A, so a useful term is proper subset which is A ⊂ B. So, the subsets from A and B include {2,3}, {2}, {3}, but not {1,4}, etc.

Next we have the following, along with venn diagrams to illustrate the operation:

  • Union, A ∪ B, is the set of all elements that are in A, B or both. A ∪ B = {1,2,3,4}.

Set Union

  • Intersection, A ∩ B, is the set of all elements that are in both A AND B. A ∩ B = {2,3}

Set int

  • Complement, sometimes also absolute complement, AC, is the set of all elements that are not in A but do exist in U, which is the universal set which is a superset of all the sets defined. So, AC = {4}. Conversely, BC = {1}. This is also called set difference and notated as B\A = {4}. This happens to be the same as the relative complement between two sets A and B because of the trivial two set construction that we are using here. If U were the positive integers, and A is as defined above, then AC is a very large set indeed, in fact an infinite set.

Set Comp

  • Relative complement, AC ∩ B, is the set of elements that are not in A nor in A ∩ B. So, AC ∩ B = {4}.

Set Relative Comp

  • Symmetric difference, (A ∪ B) \ (A ∩ B) or also A ∆ B, is the set of elements that is in either one of the sets, but not both. So, (A ∪ B) \ (A ∩ B) = {1,4}.

Set Symmetiric Difference

  • Cartesian product, A × B, is the set whose elements are all the ordered pairs (a,b). A × B = {(1,2), (1,3), (1,4), (2,2), (2,3), (2,4), (3,2), (3,3), (3,4)}. This will result in a set of n * m elements, n and m being the number of elements in A and B respectively.

  • Power set of a set A is the set whose members are all possible subsets of A (proper subsets and the set itself. v. supra). This will also include {}.

Ok, so there is the set theory, and we remember everything that we need to about membership and the relationship between the two sets. SQL is an expressive language for relational algebra, but can be coupled with set-theoretic models to aid conception of the operations being conducted. However, there are important caveats. One is that this set-theoretic model only works if there is a 1-1 correspondence between the sets A and B. For our purposes, think of A and B as two tables. If there are duplicate rows in either table, then this use of set-theory will break down horribly, because the cardinality of the sets changes.

SQL operations

Let us now just think of the sets above as tables in a schema, in which there are records. We can think of the operations that we perform as returning a set of new records, and these will correspond to the same results as some of our binary set-theoretic operations. But to aid this understanding, I will now define two new tables (Sets), A and B.

A

A.id A.name
-- ----
1 Tom
2 Sarah
3 Jamie

B

B.id B.job
-- ----
2 Analyst
3 Developer
4 Manager

The goal here is to join the records in the two tables in meaningful ways.

  • Full Outer Join
    SELECT * FROM A FULL OUTER JOIN B ON A.id = B.id

id name id job
-- ---- -- ----
1 Tom null null
2 Sarah 2 Analyst
3 Jamie 3 Developer
null null 4 Manager

This is like the union. Where there is no match, the specification defines that the missing side should contain a null.

Set Union

  • Inner Join
    SELECT * FROM A INNER JOIN B ON A.id = B.id

id name id job
-- ---- -- ----
2 Sarah 2 Analyst
3 Jamie 3 Developer

This is like the intersection.

Set int

  • Records unique to both A and B
    SELECT * FROM A FULL OUTER JOIN B ON A.id = B.id WHERE A.id IS null OR B.id IS null

id name id job
-- ---- -- ----
1 Tom null null
null null 4 Manager

This is the symmetric difference.

Set Sym Diff

  • Records unique to A
    SELECT * FROM A LEFT OUTER JOIN B ON A.id = B.id WHERE B.id IS null

id name id job
-- ---- -- ----
1 Tom null null

This is the relative complement, but the opposite of what we defined above, in this case, BC ∩ A.

Set Relative Comp b

  • Records unique to A
    SELECT * FROM A LEFT OUTER JOIN B ON A.id = B.id

id name id job
-- ---- -- ----
1 Tom null null
2 Sarah 2 Analyst
3 Jamie 3 Developer

This is the set from A, but the specification will show null values for any non match from B.

Set A

  • Cross join or Theta Join
    SELECT * FROM A CROSS JOIN B

This will join everything, resulting in a n X m result set. This is the cartesian product.

Note that you don’t have to result to CROSS JOIN to get the same (hopefully, unintended) result set.

  • Power sets
    SELECT * FROM A WHERE A.id = 1

This will produce subsets of A with appropriate WHERE clauses.

About Shawn Mehan

Shawn Mehan
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

One Response to Set Theory and SQL Concepts

  1. Pingback: Self-Taught Developers: Are You Missing Your Foundation? | Ameya Karve's Weblog

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>