Data Mining Using SAS Enterprise Miner

SAS Data Mining Data Mining Using SAS Enterprise Miner introduces the reader to a wide variety of data mining techniques in SAS® Enterprise Miner. This first-of-a-kind book explains the purpose of -- and reasoning behind -- every node that is a part of SAS® Enterprise Miner with regard to SEMMA design and SAS data mining analysis. Each chapter starts with a short introduction to the assortment of statistics that are generated from the various SAS® Enterprise Miner nodes, followed by detailed explanations of the configuration settings and the generated results that are located within each node. The end result of the author’s meticulous presentation is a well crafted study guide on the various methods that one employs to randomly sample, partition, transform, and filter the data within the process flow of SAS® Enterprise Miner. The book will explain the wide assortment of modeling designs that are available in addition to the process of assessing the various models under comparison in SAS® Enterprise Miner v4.3. 
                          view back cover

A Quick Peek of the Book

   · Table of Contents

    · Introduction

   · Sample Chapter

Extra Chapters of my Book

    · Insight Node

   · Decision Tree Interactive Training 

Contents of my Book on this Page

SAS Data Mining Links
      ·  SAS Data Mining Links

Highlights of the Book

·

An overview to the sampling nodes in SAS® Enterprise Miner v4.3 to read the data and learn about the various methods to both randomly sample and partition the data within the process flow in SAS® Enterprise Miner v4.3.

·

An overview to the explore nodes that are usually used as a preliminary step in data mining in order to observe the distribution or range of values to each variable separately or as a group in discovering relationships or patterns in the data from the assortment of charts and graphs that are generated from the nodes. In addition, market basket analysis can be performed in order to determine the strength of the association between various items of interest.

·

An overview to the modify nodes that are designed to perform preliminary modification to the data set in preparation to data mining analysis by removing extreme values or outlandish observations, estimating missing observations in the data, collapsing and massaging the data in preparation to time series modeling, performing clustering analysis in determining various groupings and characteristics within each group that is created in determining the best groupings to form in the data based on a binary-valued variable. 

·

An overview to the modeling nodes that will make the readers get familiar with the assortment of modeling designs that can be performed within Enterprise Miner such as traditional regression modeling, logistic regression modeling, decision tree modeling, neural network modeling, principal components modeling, and nearest neighbor modeling. The book will introduce the readers to an additive nonlinear modeling design called dmneural network modeling. The model is designed to predict either an interval or binary target variable in which the predicted values are calculated from three separate models. The reason it is called an additive nonlinear modeling design is because the first model will predict the target variables, with the subsequent models predicting the residual values from the previous model. The fitted values are calculated by adding the separate nonlinear models. In addition, the book will introduce the readers to a unique two-stage modeling design that can be performed within SAS® Enterprise Miner. The two-stage modeling is designed to fit two separate models in succession by combining the probability estimates from the initial classification model to predict the continuous response variable in the subsequent predictive model. In addition, the book will introduce a very powerful modeling node in Enterprise Miner called ensemble modeling. Ensemble modeling is designed to increase the accuracy, stability and consistency of the modeling estimates by either averaging the prediction estimates from different models or averaging the prediction estimates based on successive fits of the same predictive model. In addition, the Ensemble node has the added capability to perform a couple resampling techniques called bagging and boosting resampling. Bagging is similar to bootstrap sampling in which you may iteratively generate your modeling estimates by resampling the data with replacement, then averaging the fitted values that results in the final modeling estimates. Boosting is a classification modeling technique in which the estimated probabilities are adjusted by weight estimates where the weights are increased when the classification model misclassified the target categories from the previous model.

·

An overview to the assessment nodes that will allow you to evaluate and assess the results that are generated from the numerous models within the process flow by viewing the various performance charts that are generated to determine which classification model performs best to determining the accuracy of the predictive models from the various assessment statistics and diagnostic charts that are generated from the node. The book will introduce the readers to the Reporter node that assembles and consolidates the numerous option settings and result listings from the various nodes in the process flow diagram into a single HTML report that can be viewed by your favorite Web browser.

·

The remaining sections to the book will explain the rest of the other nodes in Enterprise Miner such as the Score node that will allow you to manage, edit, execute and create your own custom-designed score code in order to generate entirely different estimates from a new set of values in data mining analysis. This will be followed by the SAS Code node that is one of the most powerful nodes within SAS® Enterprise Miner that will allow you to perform SAS programming within the process flow. 

Programs from Data Mining Using SAS® Enterprise Miner 

The following is the SAS/IML programming code that is in regards to my book, Data Mining Using SAS® Enterprise Miner.

Linear Regression Modeling

SAS/IML programming code that computes traditional regression estimates. The SAS/IML® program will calculate the predicted values, residual values, parameter estimates and associated standard errors and t-test statistics.

Logistic Regression Modeling

SAS/IML programming code that computes logistic regression estimates. The SAS/IML® program will calculate the parameter estimates and the likelihood ratio goodness-of-fit statistics by fitting a binary-valued response variable to predict based on the maximum likelihood method. An iterative process is applied in computing the maximum likelihood parameter estimates to determine the final parameter estimates until convergence. 

K-Means Clustering

SAS programming code that computes the k-means clustering estimates. Each observation is assigned to the cluster with the smallest squared Euclidean distance based on two separate clusters that are created.

Principal Components

SAS/IML programming code that computes the principal component estimates from the 2004 major league baseball hitters. The SAS/IML® program will calculate the principal component scores based on the correlation matrix since the various hitting departments that are measured in different units. A scatter plot from the first two principal components will be generated in order for you to observe the variability, outliers and the various groupings that are formulated from the best hitters in the game of baseball.

Articles of Interest

Matignon, Randall, An Overview of SAS Enterprise Miner

The paper is designed to make the reader get familiar with the working environment of SAS® Enterprise Miner v4.3. The paper will provide you with the general option settings that are available when you first open SAS® Enterprise Miner v4.3. 

Matignon, Randall, Data Mining Using SAS Enterprise Miner

The paper is in reference to my book that is a overview to the multitude of nodes that are available in SAS® Enterprise Miner v4.3. The paper will provide you with the purpose of each node, the option settings that are available and the results that are generated from each node.

SAS Institute, Finding the Solution to Data Mining

This is an update to the subsequent paper on SAS® Enterprise Miner. .

SAS Institute, Finding the Solution to Data Mining

 

The paper is in reference to understanding the capability of data mining and the various nodes that can be used in Enterprise Miner v4.3 to perform data mining.

Groth, Han, and Kamber, SAS Institute, Data Mining

 

This Microsoft PowerPoint presentation provides you with a brief overview to data mining using SAS® Enterprise Miner v4.3 and the HMEQ home equity loan SAS data set. 

Bommreddy, Mahesh and Kadiyala, Chaithanya, Data Mining Using SAS Enterprise Miner

 

The paper is a brief overview to data mining and the various nodes that are a part of SAS® Enterprise Miner v4.3. 

E-Commerce Data Mining Techniques-SAS Enterprise Miner Tutorial

 This short article provides you with a brief description of the working environment such as the various main menu option settings in SAS® Enterprise Miner v4.3. In addition, the paper will introduce you to the various options settings and and results that are generated from the some of the modeling nodes in SAS® Enterprise Miner v4.3.

Tom Bohannon, SAS Institute, Overview of Data Mining

 

This course note provides you with a brief overview to data mining and the SEMMA process using SAS® Enterprise Miner v5

KPMG Consulting, Best Practices Approach to the Manufacturing Industry

 

The paper explains various SEMMA data mining techniques that are used to address problems in the manufacturing industry using SAS® Enterprise Miner v4.3

Modeling Credit Risks: A Practice Lesson for SAS Enterprise Miner

 This course note explains the HMEQ home equity data set that was used in my book, that is "Data Mining Using SAS Enterprise Miner". The paper briefly explains the process of constructing a SAS® project and workspace diagram. The paper compares the classification performance between logistic regression, neural network and decision tree modeling using SAS® Enterprise Miner v4.3.

Bergquist, David , Evaluation of Significant Variables Between SAS Enterprise Miner and Tetrad

 The purpose of the paper is to compare the modeling results from various modeling designs in SAS® Enterprise Miner v4.3 and Tetrad data mining software in predicting the movement of various toothbrushes.

Ripley, B.D. , Statistical Data Mining

 The paper is written by B.D. Ripley who is one of the most knowledgeable person in the field of data mining. The paper explains both traditional clustering and SOM clustering along with some of the modeling designs that are used in SAS® Enterprise Miner v4.3.

SAS Institute, Data Mining and the Case for Sampling

 The paper explains the various sampling methods that are available in the Sampling node of SAS® Enterprise Miner v4.3 This paper discusses the use of sampling as a statistically valid practice for processing large databases. The paper discusses the advantages and disadvantages of sampling for data mining, in addition, to explaining the importance of a random sample in achieving a quality sample.

Allison, Paul, Multiple Imputation of Missing Data

 The paper is in reference to the multiple imputation method that is an option that is available in the Replacement node of SAS® Enterprise Miner. The Replacement node is designed to impute or estimate missing values. Multiple imputation uses an appropriate model to predict the missing values of the variable by all other variables with non-missing values by iteratively fitting the model numerous times, then averaging the estimates.

Bao, Xlinli, Mining Transaction/Order Data Using SAS Enterprise Miner

 The paper is in regards to the Association node in SAS® Enterprise Miner. The paper describes the process of analyzing items that have been purchased from the ASSOCS data set SAS® data set that was used in my book. 

Neville, Padraic, Decision Trees for Predictive Modeling

 The paper is in reference to the decision tree modeling that is used in the Tree node in SAS® Enterprise Miner. The article will explain the various decision tree modeling methods. In addition, the article will briefly explain the various ensemble modeling designs such as combining models, boosting, and bagging resampling.

SAS Institute, The ARBORETUM Procedure

 The paper is in reference to the SAS® data mining procedure that is used to perform decision tree modeling. The importance of this paper is that some of the option settings that are available in the Tree node in SAS® Enterprise Miner v4.3 are explained in the article.

SAS Institute, DMNeural Procedure

 The paper is in reference to the SAS® dmneural procedure that is used to perform dmneural network modeling. The importance of this paper is that some of the option settings that are available in the the Princomp/Dmneural node are explained in the article.

Cox, James, Multidimensional Binary Search Trees Used for Associative Searching

 The paper is in reference to the RD-tree partitioning technique that is used in the Memory-Based Reasoning node in SAS® Enterprise Miner. The partitioning technique is designed to determine the number of data points to use in calculating the fitted values in nearest neighbor modeling. The number of data points to combine is determined by the smoothing constant that must be provided in the predictive or classification modeling design. The technique performs binary splits to the data in which the final partitioning of the data results in a hypercube of data points that are used in calculating the fitted values.   

Breiman, Leo, Arcing Classifiers

 The paper is in reference to boosting resampling that is used in the Ensemble node. The paper provides the reader with the formula that is used in SAS® Enterprise Miner and boosting resampling in which the weight estimates are calculated that are used to adjust the estimated probabilities from the classification model to generate the probability estimates to the boosting model.

Cerrito, Patrica B., Comparison of Enterprise Miner and SAS/Stat for Data Mining

 The paper compares some of the procedure output listings from various statistical procedures that are available in SAS® for data mining with the results that are generated from some of the SAS® Enterprise Miner nodes.

Gallaugher, John, Modeling Credit Risks: A Introduction to Data Mining using SAS Enterprise Miner

 The paper constructs a process flow diagram for credit risk modeling. The paper explains the process in constructing separate classification models such as logistic regression, neural network, and decision tree models, then assessing the accuracy between the separate models.

Sarle, Warren, SAS Macro Programs for Jacknifing and Bootstrapping 

 This link will provide you with various SAS macro programs for jackknifing and bootstrapping parameter estimation in computing approximate standard errors, bias-corrected estimates, and confidence intervals, that is assuming that there is simple random sampling that is performed.

Data Mining Using SAS Enterprise Miner Blog

Data Mining Using SAS Enterprise Miner Blog
  The following link is to my personal blog. The purpose of my blog is to give you an opportunity to respond to me by simply posting any questions that you might have with my book, that is "Data Mining Using SAS Enterprise Miner". Otherwise, please feel free to ask any questions that you might have with SAS® Enterprise Miner. 

Ordering Information

John Wiley & Sons, Inc.

111 River Street
Hoboken, NJ 07030-5774
http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470149019.html

Back to Home Page

 

© copyright www.sasenterpriseminer.com - SAS data mining analysis | data mining modeling.