In Oracle “Not IN” is not your best friend with big data

Let us assume that you have a Brand table that you’d like to query and get information such as count, de-normalization etc…

Method number 1 that is slow consists of the following query:

SELECT
CUST_ID,
COUNT(DISTINCT BRAND) AS BRAND_COUNT,
CASE(WHEN BRAND = 'MERCEDES' THEN 'Y' ELSE 'N' END) AS HAS_MERCEDES, CASE(WHEN BRAND = 'BMW' THEN 'Y' ELSE 'N' END) AS HAS_BMW
FROM
CUST_BRAND
WHERE
BRAND NOT IN ('VOLKSWAGEN' , 'OPEL' , 'FIAT' )
GROUP BY CUST_ID;

Method number 2 that is faster consists of the following query:

SELECT
CUST_ID,
COUNT(DISTINCT BRAND) AS BRAND_COUNT,
CASE(WHEN BRAND = 'MERCEDES' THEN 'Y' ELSE 'N' END) AS HAS_MERCEDES, CASE(WHEN BRAND = 'BMW' THEN 'Y' ELSE 'N' END) AS HAS_BMW
FROM
CUST_BRAND ,
( SELECT 'VOLKSWAGEN' AS BRAND FROM DUAL UNION ALL
SELECT 'OPEL' AS BRAND FROM DUAL UNION ALL
SELECT 'FIAT' AS BRAND FROM DUAL ) EXCLUDE
WHERE
CUST_BRAND.BRAND = EXCLUDE.BRAND(+)
AND
EXCLUDE.BRAND IS NULL
GROUP BY CUST_ID;

Method # 2 invokes an outer join to a dummy table. By doing so, if you have a big table you will see a substantial amount of saved time.

Side Note:
Query #2 is recommended if your column is null-able. You can use “NOT IN” if you add “Brand is not null” condition. I recommend to write both queries, look at the explain plans, execute and then decide which one is more efficient.