In Oracle “Not IN” is not your best friend with big data

Let us assume that you have a Brand table that you’d like to query and get information such as count, de-normalization etc…

Method number 1 that is slow consists of the following query:

SELECT
CUST_ID,
COUNT(DISTINCT BRAND) AS BRAND_COUNT,
CASE(WHEN BRAND = 'MERCEDES' THEN 'Y' ELSE 'N' END) AS HAS_MERCEDES, CASE(WHEN BRAND = 'BMW' THEN 'Y' ELSE 'N' END) AS HAS_BMW
FROM
CUST_BRAND
WHERE
BRAND NOT IN ('VOLKSWAGEN' , 'OPEL' , 'FIAT' )
GROUP BY CUST_ID;

Method number 2 that is faster consists of the following query:

SELECT
CUST_ID,
COUNT(DISTINCT BRAND) AS BRAND_COUNT,
CASE(WHEN BRAND = 'MERCEDES' THEN 'Y' ELSE 'N' END) AS HAS_MERCEDES, CASE(WHEN BRAND = 'BMW' THEN 'Y' ELSE 'N' END) AS HAS_BMW
FROM
CUST_BRAND ,
( SELECT 'VOLKSWAGEN' AS BRAND FROM DUAL UNION ALL
SELECT 'OPEL' AS BRAND FROM DUAL UNION ALL
SELECT 'FIAT' AS BRAND FROM DUAL ) EXCLUDE
WHERE
CUST_BRAND.BRAND = EXCLUDE.BRAND(+)
AND
EXCLUDE.BRAND IS NULL
GROUP BY CUST_ID;

Method # 2 invokes an outer join to a dummy table. By doing so, if you have a big table you will see a substantial amount of saved time.

Side Note:
Query #2 is recommended if your column is null-able. You can use “NOT IN” if you add “Brand is not null” condition. I recommend to write both queries, look at the explain plans, execute and then decide which one is more efficient.

One thought on “In Oracle “Not IN” is not your best friend with big data

  1. I like the Decode function in these situations.

    SELECT
    CUST_ID,
    COUNT(DISTINCT BRAND) AS BRAND_COUNT,
    CASE(WHEN BRAND = ‘MERCEDES’ THEN ‘Y’ ELSE ‘N’ END) AS HAS_MERCEDES, CASE(WHEN BRAND = ‘BMW’ THEN ‘Y’ ELSE ‘N’ END) AS HAS_BMW
    FROM
    CUST_BRAND
    WHERE
    DECODE(BRAND,’VOLKSWAGEN’,’OUT’, ‘OPEL’, ‘OUT’, ‘FIAT’, ‘OUT’, ‘IN’) = ‘IN’
    GROUP BY CUST_ID;

Leave a Reply

Your email address will not be published. Required fields are marked *