In Oracle “Not IN” is not your best friend with big data

Let us assume that you have a Brand table that you’d like to query and get information such as count, de-normalization etc…

Method number 1 that is slow consists of the following query:

SELECT
CUST_ID,
COUNT(DISTINCT BRAND) AS BRAND_COUNT,
CASE(WHEN BRAND = 'MERCEDES' THEN 'Y' ELSE 'N' END) AS HAS_MERCEDES, CASE(WHEN BRAND = 'BMW' THEN 'Y' ELSE 'N' END) AS HAS_BMW
FROM
CUST_BRAND
WHERE
BRAND NOT IN ('VOLKSWAGEN' , 'OPEL' , 'FIAT' )
GROUP BY CUST_ID;

Method number 2 that is faster consists of the following query:

SELECT
CUST_ID,
COUNT(DISTINCT BRAND) AS BRAND_COUNT,
CASE(WHEN BRAND = 'MERCEDES' THEN 'Y' ELSE 'N' END) AS HAS_MERCEDES, CASE(WHEN BRAND = 'BMW' THEN 'Y' ELSE 'N' END) AS HAS_BMW
FROM
CUST_BRAND ,
( SELECT 'VOLKSWAGEN' AS BRAND FROM DUAL UNION ALL
SELECT 'OPEL' AS BRAND FROM DUAL UNION ALL
SELECT 'FIAT' AS BRAND FROM DUAL ) EXCLUDE
WHERE
CUST_BRAND.BRAND = EXCLUDE.BRAND(+)
AND
EXCLUDE.BRAND IS NULL
GROUP BY CUST_ID;

Method # 2 invokes an outer join to a dummy table. By doing so, if you have a big table you will see a substantial amount of saved time.

Side Note:
Query #2 is recommended if your column is null-able. You can use “NOT IN” if you add “Brand is not null” condition. I recommend to write both queries, look at the explain plans, execute and then decide which one is more efficient.

Oracle Pivot function

I was amazed at how handy the Pivot function can be. It empowers the user to provide aggregate and denormalized view of a data set which could be a query or a table. In this article I am going to provide a brief example of the pivot function.

First of all let us set some data: 


CREATE TABLE SAMPLE_TRANSACTIONS
(
TRANSACTION_ID NUMBER(8),
COMPANY_NAME VARCHAR2(30),
STATE VARCHAR2(2),
TRANSACTION_AMOUNT NUMBER(10,2)
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 1 , 'PALERMO' , 'TX' , 300.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 2 , 'PALERMO' , 'CT' , 600.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 3 , 'PALERMO' , 'PA' , 900.669
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 4 , 'BOLOGNA' , 'TX' , 500.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 5 , 'BOLOGNA' , 'CT' , 1100.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 6 , 'BOLOGNA' , 'PA' , 100.669
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 7 , 'PARMA' , 'NY' , 1300.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 8 , 'PARMA' , 'NJ' , 1600.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 9 , 'PARMA' , 'CA' , 1900.669
);

Now let us view the data:

SELECT * FROM SAMPLE_TRANSACTIONS;

 

TRANSACTION_IDCOMPANY_NAMESTATETRANSACTION_AMOUNT
1PALERMOTX300.22
2PALERMOCT600.22
3PALERMOPA900.67
4BOLOGNATX500.22
5BOLOGNACT1100.22
6BOLOGNAPA100.67
7PARMANY1300.22
8PARMANJ1600.22
9PARMACA1900.67

 

Now let us create a report that provides the sum of the transactions by company at a state level:
Let us first execute this query:


SELECT *
FROM
( SELECT COMPANY_NAME , STATE , TRANSACTION_AMOUNT
FROM SAMPLE_TRANSACTIONS
) PIVOT
( SUM(TRANSACTION_AMOUNT)
FOR
COMPANY_NAME IN
('BOLOGNA' AS BOLOGNA , 'PARMA' AS PARMA, 'PALERMO' AS PALERMO) )
ORDER BY STATE;

Results:

STATEBOLOGNAPARMAPALERMO
CA{null}1900.67{null}
CT1100.22{null}600.22
NJ{null}1600.22{null}
NY{null}1300.22{null}
PA100.67{null}900.67
TX500.22{null}300.22

 

Now let us create a report that provides the sum of the transactions by state at a company level:

Let us first execute this query:


SELECT *
FROM
( SELECT COMPANY_NAME , STATE , TRANSACTION_AMOUNT FROM SAMPLE_TRANSACTIONS
) PIVOT
( SUM(TRANSACTION_AMOUNT)
FOR STATE IN ('CT' CT , 'NY' NY, 'TX' TX) )
ORDER BY COMPANY_NAME;

Results

COMPANY_NAMECTNYTX
BOLOGNA1100.22{null}500.22
PALERMO600.22{null}300.22
PARMA{null}1300.22{null}

 

As you can see above there are a lot of possibilities on how to leverage the pivot function. For further reading I recommend this link Oracle Pivot.