Monday, November 24, 2008

OLAP Versus Data Mining

OLAP is a data summarization/aggregation tool that helps simplify data analysis, while data
mining allows the automated discovery of implicit patterns and interesting knowledge hidden in large amounts of data.
Key is OLAP

OLAP tools are targeted toward simplifying and supporting interactive data analysis,
whereas the goal of data mining tools is to automate as much of the process as possible,
while still allowing users to guide the process. In this sense, data mining goes one step beyond traditional on-line analytical processing.
Key is OLAP tools

OLAP functions are essentially for user-directed data summary and comparison
(by drilling, pivoting, slicing,dicing, and other operations). Data mining covers a much broader spectrum than simple OLAP operations because it performs not only data summary and comparison but also association, classification, prediction, clustering, time-series analysis, and other data analysis tasks.
Key is OLAP Operations

Data mining is not confined to the analysis of data stored in data warehouses. It may
analyze data existing at more detailed granularities than the summarized data provided
in a data warehouse. It may also analyze transactional, spatial, textual, and multimedia
data that are difficult to model with current multidimensional database technology. In
this context, data mining covers a broader spectrum than OLAP with respect to data
mining functionality and the complexity of the data handled.
Key is OLAP data analysis

Because data mining involves more automated and deeper analysis than OLAP,
data mining is expected to have broader applications. Data mining can help business
managers find and reach more suitable customers, as well as gain critical
business insights that may help drive market share and raise profits. In addition,
data mining can help managers understand customer group characteristics
and develop optimal pricing strategies accordingly, correct item bundling based
not on intuition but on actual item groups derived from customer purchase patterns,
reduce promotional spending, and at the same time increase the overall net
effectiveness of promotions.

Tuesday, November 11, 2008

Table Expressions

Four types of table expressions:
1. Derived tables
2. Common table expressions (CTEs),
3. Views
4. Inline table-valued functions (inline TVFs) also known as Parametrized views.

Beneifts.

Table expressions are virtual Tables.
The benefi ts of using table expressions uses a modular approach and not to performance.

Any Table Expression should meet the following three requirements.

1. Order is not guaranteed because the rows in a relational table are not ordered except with one restriction using Top Clause with Order By. Here the Order by is used for restricting the rows.
2. All columns must have names.
3. All column names must be unique.

To reuse the definitions , use views and inline TVFs.
Not to reuse the definitions, use dervied tables and CTEs.

APPLY operator

Multiple instances of the same CTE The APPLY operator operates on two input tables, the second of which may be a table expression; I’ll refer to them as the left and right tables. The right table is usually a derived table or an inline TVF.

If the right table expression returns an empty set, the CROSS APPLY operator does not returnthe corresponding left row. To return the empty set use the OUTER Apply.

--------------------------------------------------------------------------------

In its non-recursive form, a CTE is an alternative to derived tables, views, and inline user-defined functions (UDFs).

A derived table is a named table expression that exists for the duration of a query.
Views are also named table expressions that persist in the database until you explicitly drop them. Unlike with derived tables
1. a query can refer to a view name multiple times;
2. the view can't refer to variables.

Inline UDFs have the same characteristics as views except that they can refer to the function's input arguments.

CTEs have the best features of derived tables, views, and inline UDFs—they are named table expressions that exist only for the duration of the query, that an outer query can refer to multiple times, and that can refer to variables defined in the calling batch. . . .