SQL Join Guide

I’m working on a project at my internship right now that has helped me learn a lot about SQL, mainly through making some stupid mistakes. This article focuses on some issues I ran into with joining many tables together with several one-to-many relationships.

SQL JOIN

The JOIN command combine tables together like intersection and union work on sets in math. Consider sets A and B. This is how a JOIN query would work on them:

  • JOIN (INNER JOIN): A ∩ B
  • FULL OUTER JOIN: A ∪ B
  • LEFT JOIN (LEFT INNER JOIN): A - B
  • RIGHT JOIN (RIGHT INNER JOIN): B - A

The syntax of these join statements is fairly straight forward:

-- normal inner join
select col1, col2, ...
from TABLELEFT
inner join TABLERIGHT
on TABLELEFT.KEY = TABLERIGHT.KEY

-- left and right joins just replace "inner" with "left/right"

-- outer join
select col1, col2, ...
from TABLELEFT
full outer join TABLERIGHT
on TABLELEFT.KEY = TABLERIGHT.KEY

Let’s look at an example on the following tables:

-- USERS

ID         |NAME                          |ROLE
----------------------------------------------------------
1          |Joe                           |NULL
2          |Austin                        |regex god
4          |John                          |ruby god
5          |Cameron                       |vim god


-- TASKS
ID         |NAME                          |SHORTNAME |USERNAME
--------------------------------------------------------------
1          |Create finance reports        |fin1      |2
2          |Write documentation           |doc1      |5
3          |Solve P vs. NP                |math      |2


-- TASKUPDATE
------------------------------------------------------
1          |3          |Nope! They are not equal!
2          |2          |Docs posted to helpful.wiki

In this query, I’ll use a sub-query to select the unique users among the TASKS table and grab their names from the USERS table. A sub-query is simply a nested SQL statement that returns values before going on to an outer statement.

select name
    from users, (
        select distinct username from tasks
                ) t
    where users.id = t.username

The returned table would be this:

NAME
---------------
Austin
Cameron

It’s important that you you structure your subqueries properly and join your rows together as you move up the chain. Joining on a particular value is similar to using a conditional statement (WHERE x = y AND ...).

A badly ordered query to aggregate data can result in an invalid table. For example:

-- not mapped right! since subquery's first row is null
select tasks.id as "TID", tasks.normalized_name as "SHORT NAME", tasks.name,
    taskusers.u as "TASK LEAD", taskupdate.notes
    from tasks
    join (
        select users.name as u, tasks.id as tid
        from users, tasks
        where users.id = tasks.taskuser
        group by users.name, tasks.id
        ) taskusers
    on taskusers.tid = tasks.id
left join taskupdate
on taskupdate.id = tasks.id;

You might expect this to return a nice table with task id’s, task names, user names, and the user’s task notes.

This returns the following table, which includes a null row that’s in the wrong spot:

NAME                          |TASK LEAD                     |NOTES
--------------------------------------------------------------------------------------------
Create finance reports        |Austin                        |Nope! They are not equal!
Write documentation           |Cameron                       |Docs posted to helpful.wiki
Solve P vs. NP                |Austin                        |NULL

This occurs since the left outer join returns a null row and when the queries are joined together, the sequencing is thrown off. This is the subquery that’s not being joined properly.

select tasks.name as "TASK NAME", notes
from tasks
left join taskupdate
on tasks.id = taskupdate.task_id;

This is its result:

TASK NAME                          |NOTES
-------------------------------------------------------------
Create finance reports        |NULL
Write documentation           |Docs posted to helpful.wiki
Solve P vs. NP                |Nope! They are not equal!

How we solve this is by joining the table that could have null values (TASKUPDATE) with the remaining values first, then selecting the names from the tables which definitely do have values.

select tasks.name as "TASK NAME", taskusers.u as "TASK LEAD", taskupdate.notes
    from tasks
    left join taskupdate
    on tasks.id = taskupdate.task_id
    join (
        select users.name as u, tasks.id as tid
        from users, tasks
        where users.id = tasks.username
        ) taskusers
on taskusers.tid = tasks.id;

2019

Debugging weird c++ behavior

2 minute read

Sometimes you’ll have a program that seems to work but will produce a ton of Valgrind errors. You’ll get some fairly strange errors about invalid reads or co...

Scoping and this in c++

2 minute read

As a course producer for USC’s CSCI 104 students will often ask me questions about stuff I don’t know or didn’t think about. It makes me feel a little dumb f...

SQL Join Guide

3 minute read

I’m working on a project at my internship right now that has helped me learn a lot about SQL, mainly through making some stupid mistakes. This article focuse...

Redis Basics

4 minute read

The name redis comes from RE mote DI ctionary S torage, which is a good description of what redis is: a fast, key-value data structure. Redis is an in-memory...

Using the name variable in Python

1 minute read

When writing modules in python, it’s nice to test individual components. You can always add some messy test cases at the bottom of your code but, of course, ...

Getting Help in Vim

less than 1 minute read

The :help command in Vim is possibly the most powerful one of all. I agree with Romain Lafourcade of this handy website about that.

Customizing your Bash prompt

2 minute read

Your terminal can be customized in several different ways including how your prompt appears. You can color code shell variables such as your username and dir...

Basic Bash Backup

1 minute read

I spent most of early Saturday morning finishing up my backup scripts and debugging Bash. One could say I was, Bashing my head against the wall. No need to ...

Back to Top ↑