Monday, November 30, 2015

Data Quality




CPPS -- Cleansing , Profiling , Parsing and Standarization
MEM --  Matching , Enrichment ,  Monitoring.


Data Profiling is a methodical approach to identify the data issues like inconsistency , Missing Data , duplicates etc.  A data Profiling will be carried out to analyze the health of the data ..

Data Quality Metrics --

Completeness , Conformity , Consistency ,  Duplicates, Accuracy , and  Validity .

IDQ has two types of transformations -- a. general (power center transformations) b. data quality transformations



Profiling Data Overview

Use profiling to find the content, quality, and structure of data sources of an application, schema, or enterprise. The data source content includes value frequencies and data types. The data source structure includes keys and functional dependencies.

 Analysts and developers can use these tools to collaborate, identify data quality issues, and analyze data relationships


Data profiling is often the first step in a project. You can run a profile to evaluate the structure of data and verify that data columns are populated with the types of information you expect. If a profile reveals problems in data, you can define steps in your project to fix those problems. For example, if a profile reveals that a column contains values of greater than expected length, you can design data quality processes to remove or fix the problem values.


A profile that analyzes the data quality of selected columns is called a column profile.

Note: You can also use the Developer tool to discover primary key, foreign key, and functional dependency relationships, and to analyze join conditions on data columns.

A column profile provides the following facts about data:

• The number of unique and null values in each column, expressed as a number and a percentage.
 • The patterns of data in each column, and the frequencies with which these values occur.
• Statistics about the column values, such as the maximum and minimum lengths of values and the first and last values in each column.
• For join analysis profiles, the degree of overlap between two data columns, displayed as a Venn diagram and as a percentage value. Use join analysis profiles to identify possible problems with column join conditions.

You can run a column profile at any stage in a project to measure data quality and to verify that changes to the data meet your project objectives. You can run a column profile on a transformation in a mapping to indicate the effect that the transformation will have on data

You can perform the following tasks in both the Developer tool and Analyst tool:

• Perform column profiling. The process includes discovering the number of unique values, null values, and data patterns in a column.
• Perform data domain discovery. You can discover critical data characteristics within an enterprise.
• Curate profile results including data types, data domains, primary keys, and foreign keys.
 • Create scorecards to monitor data quality.
 • View scorecard lineage for each scorecard metric and metric group.
• Create and assign tags to data objects.
• Look up the meaning of an object name as a business term in the Business Glossary Desktop. For example, you can look up the meaning of a column name or profile name to understand its business requirement and current implementation.


You can perform the following tasks in the Developer tool:

• Discover the degree of potential joins between two data columns in a data source.
• Determine the percentage of overlapping data in pairs of columns within a data source or multiple data sources.
• Compare the results of column profiling.
• Generate a mapping object from a profile.
• Discover primary keys in a data source.
• Discover foreign keys in a set of one or more data sources.
• Discover functional dependency between columns in a data source.
• Run data discovery tasks on a large number of data sources across multiple connections. The data discovery tasks include column profile, inference of primary key and foreign key relationships, data domain discovery, and generating a consolidated graphical summary of the data relationships.

You can perform the following tasks in the Analyst tool:

• Perform enterprise discovery on a large number of data sources across multiple connections. You can view a consolidated discovery results summary of column metadata and data domains.
• Perform discovery search to find where the data and metadata exists in the enterprise. You can search for specific assets, such as data objects, rules, and profiles. Discovery search finds assets and identifies relationships to other assets in the databases and schemas of the enterprise




Informatica Analyst Frequently Asked Questions


Can I use one user account to access the Administrator tool, the Developer tool, and the Analyst tool? Yes. You can give a user permission to access all three tools. You do not need to create separate user accounts for each client application.


Where is my reference data stored?

You can use the Developer tool and the Analyst tool to create and share reference data objects.
The Model repository stores the reference data object metadata. The reference data database stores reference table data values. Configure the reference data database on the Content Management Service.


Informatica Developer Frequently Asked Questions

What is the difference between a source and target in PowerCenter and a physical data object in the Developer tool?

In PowerCenter, you create a source definition to include as a mapping source. You create a target definition to include as a mapping target. In the Developer tool, you create a physical data object that you can use as a mapping source or target.

What is the difference between a mapping in the Developer tool and a mapping in PowerCenter?

A PowerCenter mapping specifies how to move data between sources and targets.
A Developer tool mapping specifies how to move data between the mapping input and output.
A PowerCenter mapping must include one or more source definitions, source qualifiers, and target definitions. A PowerCenter mapping can also include shortcuts, transformations, and mapplets.
A Developer tool mapping must include mapping input and output. A Developer tool mapping can also include transformations and mapplets.


The Developer tool has the following types of mappings:

• Mapping that moves data between sources and targets. This type of mapping differs from a PowerCenter mapping only in that it cannot use shortcuts and does not use a source qualifier.

• Logical data object mapping. A mapping in a logical data object model. A logical data object mapping can contain a logical data object as the mapping input and a data object as the mapping output. Or, it can contain one or more physical data objects as the mapping input and logical data object as the mapping output.

• Virtual table mapping. A mapping in an SQL data service. It contains a data object as the mapping input and a virtual table as the mapping output.

• Virtual stored procedure mapping. Defines a set of business logic in an SQL data service. It contains an Input Parameter transformation or physical data object as the mapping input and an Output Parameter transformation or physical data object as the mapping output.



What is the difference between a mapplet in PowerCenter and a mapplet in the Developer tool?

A mapplet in PowerCenter and in the Developer tool is a reusable object that contains a set of transformations. You can reuse the transformation logic in multiple mappings.

A PowerCenter mapplet can contain source definitions or Input transformations as the mapplet input. It must contain Output transformations as the mapplet output.

A Developer tool mapplet can contain data objects or Input transformations as the mapplet input. It can contain data objects or Output transformations as the mapplet output.

A mapping in the Developer tool also includes the following features:

• You can validate a mapplet as a rule. You use a rule in a profile.

• A mapplet can contain other mapplets.


What is the difference between a mapplet and a rule?

You can validate a mapplet as a rule. A rule is business logic that defines conditions applied to source data when you run a profile. You can validate a mapplet as a rule when the mapplet meets the following requirements:

• It contains an Input and Output transformation.

• The mapplet does not contain active transformations.

• It does not specify cardinality between input groups.

Sunday, November 22, 2015

Database Session Topics - Newbies

Database Session Topics
What is RDBMS – Abstraction Layers
Concepts
§  SQL
§  Relationships
§  Transactions
§  ACID – Atomicity Consistency Isolation Durability
§  CRUD
§  DATABASE – data file & log file
§  ODBC / JDBC
§  Database Objects
§  Table -- column row
§  View
§  Transaction Log
§  Trigger
§  Index
§  Stored procedure
§  Functions
§  Cursor
§  Indexes:  Primary key, foreign key 
Components
§  Concurrency control
§  Data dictionary
§  Query language
§  Query optimizer
§  Query plan
Conclusion
§  Data Models : ER Models / Dimensional Models

§  OLTP/OLAP/ETL