What is Data Cleansing ?
Data cleansing (or ‘data scrubbing’) is detecting and then correcting
or removing corrupt or inaccurate records from a record set.
After cleansing, a data set will be consistent with other similar
data sets in the system. The inconsistencies detected or removed may
have been caused by different data dictionary definitions of similar
entities in different stores, or caused by user entry errors or data
which was corrupted in transmission or storage. Preprocessing the data
will also guarantee that it is unambiguous, correct, and complete.
The actual process of data cleansing may involve removing typos or
validating and correcting values against a known list of entities. The
validation may be strict such as rejecting any address that does not
have a valid ZIP code or fuzzy such as correcting records that partially
match existing, known records. Data cleansing differs from data
validation in that validation almost invariably means data is rejected
from the system at entry and is performed at entry time, rather than on
batches of data.
Thursday, October 25, 2012
Define a data Dictionary how it is used in SSIS development
Define a data Dictionary how it is
used in SSIS development
Data dictionary is the place
where we can grap the information of the objects stored in the database .
It is used in SSIS by writing
the Sql script and scheduled the script so we will never have to worry about
updating the documentation manually.
Describe your Analysis experience.
Describe your Analysis
experience.
We can use SSAS for analysis
purposes. We can make a cube ie Fact table+Dimension table to do better
analysis for the business. I know how to make a cube but I don’t have
experience of it.
In SSRS I make a report using
OLAP.
Define Normalization, Fact Tables, Foreign Key, Identity column, Seeding
Define Normalization, Fact Tables,
Foreign Key, Identity column, Seeding
Normalization is a breaking down of complex tables into
more tables to reduce redundancy and for performance.
Fact Tables= Fact tables is a
measurable tables of all the dimension tables. So It is attached to all
dimension tables. All the numeric data are stored in fact tables.
Foreign Key= It is constraint
used for data integrity and relationship with the primary table
Identity column= It
is a column where all values are unique like a primary key. For example
Identity(1,1) means , it starts from 1 and increased by 1 in every row
Have you created any? What did they do? When is a trigger fired?
Have you created any? What did
they do? When is a trigger fired?
Trigger fired when DML and DDL operation occurs in table.
They are used for data integrity of the tables and many more.
I have created a
trigger to capture the update, delete and insert data of main table to our
Archive table. So if we change any data in main table then these data
automatically saved in Archive table
What functions have you used ? How would you use these functions : Coalesce, delete, truncate, UDF, Charindex:
What functions have you used
? How would you use these functions : Coalesce, delete, truncate, UDF,
Charindex:
I used UDF’s and many aggregate functions.
I used these functions for;
Coalesce= It is used for two or
more than two attributes to change the value .It is same as case statement in
sql . For example
COALESCE(columnName1,ColumnName2,ColumnName3)
--WHEN (columnName1 IS NOT NULL) THEN columnName2
,If (columnName1 IS NULL)
THEN it will check columnname2 and so on.
Why / How have you used a stored procedures
Why / How have you used a stored
procedures
We used Stored Procedures because of its security
reason and performance reason and many more. I used stored procedure by
dragging a Execute SQL task in control flow and setting True to
IsQueryStoreprocedure
Subscribe to:
Posts (Atom)