Thursday, October 25, 2012

Data Cleansing

What is Data Cleansing ? 

Data cleansing (or ‘data scrubbing’) is detecting and then correcting or removing corrupt or inaccurate records from a record set.
After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been caused by different data dictionary definitions of similar entities in different stores, or caused by user entry errors or data which was corrupted in transmission or storage. Preprocessing the data will also guarantee that it is unambiguous, correct, and complete.
The actual process of data cleansing may involve removing typos or validating and correcting values against a known list of entities. The validation may be strict such as rejecting any address that does not have a valid ZIP code or fuzzy such as correcting records that partially match existing, known records. Data cleansing differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data.

Define a data Dictionary how it is used in SSIS development


Define a data Dictionary how it is used in SSIS development
Data dictionary is the place where we can grap the information of the objects stored in the database .
It is used in SSIS by writing the Sql script and scheduled the script so we will never have to worry about updating the documentation manually.

Describe your Analysis experience.


Describe your  Analysis experience.
We can use SSAS for analysis purposes. We can make a cube ie Fact table+Dimension table to do better analysis for the business. I know how to make a cube but I don’t have experience of it.

In SSRS I make a report using OLAP.

Define Normalization, Fact Tables, Foreign Key, Identity column, Seeding


Define Normalization, Fact Tables, Foreign Key, Identity column, Seeding

 Normalization is a breaking down of complex tables into more tables to reduce redundancy and for performance.
Fact Tables= Fact tables is a measurable tables of all the dimension tables. So It is attached to all dimension tables. All the numeric data are stored in fact tables.
Foreign Key= It is constraint used for data integrity and relationship with the primary table
Identity column= It is a column where all values are unique like a primary key. For example Identity(1,1) means , it starts from 1 and increased by 1 in every row

Have you created any? What did they do? When is a trigger fired?


Have you created any? What did they do? When is a trigger fired?

 Trigger fired when DML and DDL operation occurs in table. They are used for data integrity of the tables and many more.
I have created a trigger to capture the update, delete and insert data of main table to our Archive table. So if we change any data in main table then these data automatically saved in Archive table

What functions have you used ? How would you use these functions : Coalesce, delete, truncate, UDF, Charindex:


What functions have you used ?  How would you use these functions : Coalesce, delete, truncate, UDF, Charindex:


 I used UDF’s and many aggregate functions.
I used these functions for;
Coalesce= It is used for two or more than two attributes to change the value .It is same as case statement in sql  . For example 
COALESCE(columnName1,ColumnName2,ColumnName3)
--WHEN (columnName1 IS NOT NULL) THEN columnName2 ,If (columnName1  IS  NULL) THEN it will check columnname2 and so on.

Why / How have you used a stored procedures


Why / How have you used a stored procedures

 We used Stored Procedures because  of its security reason and performance reason and many more. I used stored procedure by dragging a Execute SQL task in control flow  and setting True to IsQueryStoreprocedure