Denormalize Transactional Data using Informatica

Let’s de-normalize transactional data.
Here is the input:

CUSTOMER_IDPRODUCT_NAMEPRODUCT_PRICETRANSACTION_DATE
1COOKING BOOKS12.506/07/2013 10:57:10
1MEDICAL BOOKS120.506/07/2013 10:57:10
1TRAVEL BOOKS10.506/07/2013 10:57:10
2FOOD PROCESSOR12.506/07/2013 10:57:10
2INTELLIGENT OVEN120.506/07/2013 10:57:10
2SAFE10.506/07/2013 10:57:10
3CLOTHING12555.2506/07/2013 10:57:10
3CAR VACUUM15520.4506/07/2013 10:57:10
3HOTEL PASS1550.5506/07/2013 10:57:10
3MAGAZINE1110.5606/07/2013 10:57:10
4ZOO PASS110.406/07/2013 10:57:10

Here is the output:
The table is quite wide so you can download it as csv.

Here is the mapping overview:
SQ ==> EXP ==> AGGR ==> TGT

mappingOverview

Step 1:
In SQ_ORDERS sort by CUSTOMER_ID

Step 2:
In EXP_SetSequenceNumbers create three variables as follow:

  1. currCustomer_v ==> to_char(CUSTOMER_ID)
  2. Seq_v ==> IIF(currCustomer_v != prevtCustomer_v , 1 ,  Seq_v + 1  ). Each time we get a new customer_id the Seq get reset to 1 otherwise it will increment by 1.
  3. prevtCustomer_v ==>to_char(CUSTOMER_ID)

Make sure that prevtCustomer_v is below currCustomer_v and Seq_v as in the following pic:

setSequenceNumberTrans

Step 3:
In AGG_DenormalizeOrders make sure that the following is set:

  1. Sorted input is checked.
  2. Group by is checked for Customer_ID
  3. Create outputs to house ProductName, ProductPrice and TransactionDate
  4. For each sequence select the value of seq. For instance, ProductName3 select FIRST( PRODUCT_NAME, seq = 3) etc…

For better understanding please see pic below:

AggregatorDenormalizeOrders

 

Step 5:
Connect the aggregator to the target.
Done!

How to denormalize data in Informatica

Most of you know that Informatica does not provide a function for denormalization. However, denormalizing a table is easy and could be done in Informatica by using two functions: an Expression and an Aggregator.

Step 1:
Let us look at the data:

CUSTOMER_BRAND

CUSTOMER_IDBRAND
1BMW
1CHEVY
1FORD
2BMW
2FORD
3CHEVY

 

Step 2:
Let us look at the desired output:

CUSTOMER_IDBMWFORDCHEVY
1YYY
3NNY
2YYN

 

Step 3:
Let us look at the general architecture of the mapping:

The mapping consists of the following a source table, a SQ, an Expression, an Aggregator, an Expression and a Target.

 

Step 4:
Let us look at the SQ:

Make sure that we sort the on CUSTOMER_ID asc. This is needed so we can send sorted input to the aggregator.

 

Step 5:
Let us look at the Expression function

The expression function will set the Brand to “Y” if it exists else it will set it to “N”.


Step 6:

Let us look the Aggregator function:

Since “Y” > “N” then if a customer has the brand it will output “Y” else it will output “N”.


Step 7:
The final step is easy just connect the Aggregator to the Target:

If you are interested in a more complicated example please check: Denormalize Transactional Data using Informatica

In Oracle “Not IN” is not your best friend with big data

Let us assume that you have a Brand table that you’d like to query and get information such as count, de-normalization etc…

Method number 1 that is slow consists of the following query:

SELECT
CUST_ID,
COUNT(DISTINCT BRAND) AS BRAND_COUNT,
CASE(WHEN BRAND = 'MERCEDES' THEN 'Y' ELSE 'N' END) AS HAS_MERCEDES, CASE(WHEN BRAND = 'BMW' THEN 'Y' ELSE 'N' END) AS HAS_BMW
FROM
CUST_BRAND
WHERE
BRAND NOT IN ('VOLKSWAGEN' , 'OPEL' , 'FIAT' )
GROUP BY CUST_ID;

Method number 2 that is faster consists of the following query:

SELECT
CUST_ID,
COUNT(DISTINCT BRAND) AS BRAND_COUNT,
CASE(WHEN BRAND = 'MERCEDES' THEN 'Y' ELSE 'N' END) AS HAS_MERCEDES, CASE(WHEN BRAND = 'BMW' THEN 'Y' ELSE 'N' END) AS HAS_BMW
FROM
CUST_BRAND ,
( SELECT 'VOLKSWAGEN' AS BRAND FROM DUAL UNION ALL
SELECT 'OPEL' AS BRAND FROM DUAL UNION ALL
SELECT 'FIAT' AS BRAND FROM DUAL ) EXCLUDE
WHERE
CUST_BRAND.BRAND = EXCLUDE.BRAND(+)
AND
EXCLUDE.BRAND IS NULL
GROUP BY CUST_ID;

Method # 2 invokes an outer join to a dummy table. By doing so, if you have a big table you will see a substantial amount of saved time.

Side Note:
Query #2 is recommended if your column is null-able. You can use “NOT IN” if you add “Brand is not null” condition. I recommend to write both queries, look at the explain plans, execute and then decide which one is more efficient.

Oracle Pivot function

I was amazed at how handy the Pivot function can be. It empowers the user to provide aggregate and denormalized view of a data set which could be a query or a table. In this article I am going to provide a brief example of the pivot function.

First of all let us set some data: 


CREATE TABLE SAMPLE_TRANSACTIONS
(
TRANSACTION_ID NUMBER(8),
COMPANY_NAME VARCHAR2(30),
STATE VARCHAR2(2),
TRANSACTION_AMOUNT NUMBER(10,2)
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 1 , 'PALERMO' , 'TX' , 300.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 2 , 'PALERMO' , 'CT' , 600.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 3 , 'PALERMO' , 'PA' , 900.669
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 4 , 'BOLOGNA' , 'TX' , 500.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 5 , 'BOLOGNA' , 'CT' , 1100.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 6 , 'BOLOGNA' , 'PA' , 100.669
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 7 , 'PARMA' , 'NY' , 1300.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 8 , 'PARMA' , 'NJ' , 1600.22
);
INSERT INTO SAMPLE_TRANSACTIONS VALUES
( 9 , 'PARMA' , 'CA' , 1900.669
);

Now let us view the data:

SELECT * FROM SAMPLE_TRANSACTIONS;

 

TRANSACTION_IDCOMPANY_NAMESTATETRANSACTION_AMOUNT
1PALERMOTX300.22
2PALERMOCT600.22
3PALERMOPA900.67
4BOLOGNATX500.22
5BOLOGNACT1100.22
6BOLOGNAPA100.67
7PARMANY1300.22
8PARMANJ1600.22
9PARMACA1900.67

 

Now let us create a report that provides the sum of the transactions by company at a state level:
Let us first execute this query:


SELECT *
FROM
( SELECT COMPANY_NAME , STATE , TRANSACTION_AMOUNT
FROM SAMPLE_TRANSACTIONS
) PIVOT
( SUM(TRANSACTION_AMOUNT)
FOR
COMPANY_NAME IN
('BOLOGNA' AS BOLOGNA , 'PARMA' AS PARMA, 'PALERMO' AS PALERMO) )
ORDER BY STATE;

Results:

STATEBOLOGNAPARMAPALERMO
CA{null}1900.67{null}
CT1100.22{null}600.22
NJ{null}1600.22{null}
NY{null}1300.22{null}
PA100.67{null}900.67
TX500.22{null}300.22

 

Now let us create a report that provides the sum of the transactions by state at a company level:

Let us first execute this query:


SELECT *
FROM
( SELECT COMPANY_NAME , STATE , TRANSACTION_AMOUNT FROM SAMPLE_TRANSACTIONS
) PIVOT
( SUM(TRANSACTION_AMOUNT)
FOR STATE IN ('CT' CT , 'NY' NY, 'TX' TX) )
ORDER BY COMPANY_NAME;

Results

COMPANY_NAMECTNYTX
BOLOGNA1100.22{null}500.22
PALERMO600.22{null}300.22
PARMA{null}1300.22{null}

 

As you can see above there are a lot of possibilities on how to leverage the pivot function. For further reading I recommend this link Oracle Pivot.

 

How to efficiently de-dupe a table in Oracle

A bad approach to de-dupe a table is to delete the dupes. Unfortunately, when dealing with tables that have millions of rows the delete method is going to take a long time and consume a lot of server resources. Below is a step by step example that shows how we can de-dupe a table using “Partition By” Oracle command.

Step 1:
Let us create a table and load data into it.

CREATE TABLE TEST_DEDUPE
(
CUSTOMER_ID NUMBER(10),
BRAND       VARCHAR2(20),
TRANSACTION_DATE DATE
);
INSERT
INTO TEST_DEDUPE VALUES
(
1 ,
'GREAT VALUE' ,
TO_DATE('01/20/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
1 ,
'GREAT VALUE' ,
TO_DATE('01/20/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
1 ,
'GREAT VALUE' ,
TO_DATE('01/20/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
1 ,
'GREAT VALUE' ,
TO_DATE('01/22/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
1 ,
'GREAT VALUE' ,
TO_DATE('01/22/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
2 ,
'MEMBERS MARK' ,
TO_DATE('01/22/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
2 ,
'MEMBERS MARK' ,
TO_DATE('01/22/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
2 ,
'MEMBERS MARK' ,
TO_DATE('01/22/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
2 ,
'MEMBERS MARK' ,
TO_DATE('01/15/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
2 ,
'MEMBERS MARK' ,
TO_DATE('01/15/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
3 ,
'MEMBERS MARK' ,
TO_DATE('01/15/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
3 ,
'MEMBERS MARK' ,
TO_DATE('01/15/2011' , 'MM/DD/YYYY')
);
INSERT
INTO TEST_DEDUPE VALUES
(
4 ,
'MEMBERS MARK' ,
TO_DATE('01/15/2011' , 'MM/DD/YYYY')
);

Now let us look at the data:

SELECT * FROM TEST_DEDUPE
CUSTOMER_IDBRANDTRANSACTION_DATE
1GREAT VALUE20-JAN-11
1GREAT VALUE20-JAN-11
1GREAT VALUE20-JAN-11
1GREAT VALUE22-JAN-11
1GREAT VALUE22-JAN-11
2MEMBERS MARK22-JAN-11
2MEMBERS MARK22-JAN-11
2MEMBERS MARK22-JAN-11
2MEMBERS MARK15-JAN-11
2MEMBERS MARK15-JAN-11
3MEMBERS MARK15-JAN-11
3MEMBERS MARK15-JAN-11
4MEMBERS MARK15-JAN-11

 

We noticed that Customer_IDs {1, 2, 3} have duplicates. For instance, Customer_ID 1 have three rows with Brand = “Great Value” and Purchase date = “20-JAN-11”. After de-duping we should only have one row.


Step 2:

Now let us rename the table that we need to de-dupe.

ALTER TABLE TEST_DEDUPE
RENAME TO TEST_DEDUPE_OLD;

Step 3:
Now let’s create a new table that has the same name as the original table.

CREATE TABLE TEST_DEDUPE AS
(SELECT CUSTOMER_ID ,
BRAND ,
TRANSACTION_DATE
FROM
(SELECT CUSTOMER_ID ,
BRAND ,
TRANSACTION_DATE ,
ROW_NUMBER()
OVER
(PARTITION BY CUSTOMER_ID , BRAND , TRANSACTION_DATE
ORDER BY TRANSACTION_DATE ) AS RN
FROM TEST_DEDUPE_OLD
)
WHERE RN = 1
);

Final Step:
Let us view the data without dupes:

SELECT * FROM TEST_DEDUPE;
CUSTOMER_IDBRANDTRANSACTION_DATE
1GREAT VALUE20-Jan-11
1GREAT VALUE22-Jan-11
2MEMBERS MARK15-Jan-11
2MEMBERS MARK22-Jan-11
3MEMBERS MARK15-Jan-11
4MEMBERS MARK15-Jan-11

 

 

How to extract the domain name from an email address

Method 1:

SELECT SUBSTR(
'EXAMPLE@YAHOO.COM' ,
INSTR('EXAMPLE@YAHOO.COM' , '@' ) + 1 )
AS EMAIL_DOMAIN
FROM DUAL;

Method 2:

SELECT
REPLACE(
REGEXP_SUBSTR( 'EXAMPLE@YAHOO.COM' , '@.*' ) , '@' )
AS EMAIL_DOMAIN
FROM DUAL;

All you need to do now is to replace ‘EXAMPLE@YAHOO.COM’ with your email address column name or value.