Posted on

SQL: Using IN / NOT IN Operators

Here is an explanation I recently shared with someone about how to use NOT IN to select a subset of data from a one-to-many relationship.   It is a simple example that covers the basic use of NOT IN (and IN) operators in SQL.  These operators can be fairly powerful for performing quick queries on complex data relationships where a programmatic solution may otherwise be required.   This is not always the best solution but it can be a useful shortcut when used appropriately.

Syntax in various SQL dialects can vary, so I am using a generic SQL dialect based on my PostgreSQL knowledge.

The Problem

You have 2 tables you are linking together.   I forget the names so I’ll call them Nurses and RolesOfNurses.    For the purpose of this explanation let’s assume you are linking them by an ID field for the Nurse.

Your Nurse data may look like this:

ID: 1   Name: Karen
ID: 2  Name: Sharon

And your RolesOfNurses data:

Nurse ID: 1 RoleID: 10
Nurse ID: 1 RoleID: 555 
NurseID: 2 RoleID: 10

Normally with a one-to-many (1:M) relationship this RolesOfNurses table is just a middle-man so there would be another table called “Roles” that may have the RoleID and an common name to make human readable output (like reports) possible. So role #10 may be “nurse” and the specialized role #555 might be “Can Give Meds”.

In your select statement you want to find “All nurses that do not have the Can Give Meds” role. In database-like talk you want “All nurses that have role #10 but not role #555”. So your instinct is to join the two tables and filter by where role == 10 and role != 555. But that is wrong. The reason is that joining tables creates one unique row for every combination of data in the joined tables.

Breaking It Down : Select Nurses and Roles

Take it from the beginning.  Let’s just say “I want a list of all the roles nurses have”.  That is a simple join:

select * from nurses where nurse.id = rolesofnurses.nurseid;

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 1, Name: Karen, NurseID: 1, RoleID: 555
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Next: Filter The Data On Role # 10 Only

This data set is the basis of your advanced query, but before we get to the “dual and” let’s start with a single where clause to filter our data.  I’ve added the parenthesis only to make the where clause groupings easier to read.  The first where clause links the two tables together to give you the above data set.  The 2nd where clause filters that data to just show nurses.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Then : Add The Second Role Filter

Now add your 2nd where clause so you can have “all nurses that don’t have the Can Give Meds role” filter.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (roleofnurses.roleid != 555);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Notice that this is NOT the desired result!

Why? Look at the data.  All of your roles match BOTH of those filters.   The roles are all 10 AND they are NOT 555.  The first instinct is wrong. Personally I blame the fact that SQL, as powerful as it is, does not process English-based query syntax the same way that our brains do. Many people will make the same mistake. Sadly, the solution is a bit more complex to explain, but I’ll try to make it simple.

Fixing The Query: Step 1 – Build A List Of IDs

I’m going to start with the 2nd part of your filter, all the IDs of nurses that have role #555.  Note I said “IDs of nurses” here, that is important.  I don’t want the full set of nurse data like we have been doing up until now.  I simply want the list of ID numbers of nurses that have role #555.

select nurseid from rolesofnurses where roledid = 555;
NurseID: 1

This is the ONLY NurseID in our data set that has role # 555.

The Correct Query : Using NOT IN

Now let’s go and get our data we want by telling SQL “give me all the nurses that have role # 10 but are NOT in the list of nurses with role # 555”:

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id NOT IN (select nurseid from rolesofnurses where roledid = 555) );

And we get the desired result:

ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Explaining NOT IN

The trick here is the <blah> NOT IN <SQL QUERY> syntax.   Most (not all) SQL dialects support this, and it is an important advanced SQL construct. It allows you to filter data based on the results of another select statement. The trick with the NOT IN operator is to make the <SQL QUERY> part return a list that consists of a single elements you want to filter on. In our case the <SQL QUERY> returns the entire list of nurses that have role #555. We use NOT IN to tell it to only accept the record if the Nurse ID is NOT IN the list of nurse IDs that have role #555 or in English “pick all nurses that do not have role #555”. In our example it was a single record, but in a larger data set it could return an entire list of records.

Notes About IN/NOT IN Usage

For a few hundred records this construct is OK.  At some point you need to start looking into temp tables and other methods as using the <blah> NOT IN <SQL QUERY> construct can consume a lot of memory on the server. You need to be intelligent about what kind of resources your server has and how large the potential data set is for the second part of the IN query because most DB engines will load the entire list of ID #s into RAM.

Another note, you can create complex constructs with IN the same way as NOT IN, for example “give me a list of all nurses that have role #10 AND role #555”.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id IN (select nurseid from rolesofnurses where roledid = 555) );

BTW, I’ve used uppercase for emphasis only.   SQL is not case sensitive.