Statistics

One of the objectives is to predict EC values using the output of a DEM. Two multivariate statistics methods were applied to this study: Classification and Regression Trees (CART) and  Bootstrapped regression trees with RandomForest. To get an understanding of the structure and relevant importance of the predictor variables  a CART analysis was performed.  From this a regression tree was created using recursive partitioning (Hamann 2010)(S-1). The data set subsequently divided up. On regression trees, the leaves or end points of the tree are the predicted values  and the branches of the tree are the splits or nodes (Hamann 2010).  The first node has the most variance explained and the length of each branch reflects variance explained.  Further information on variance explained was obtained from a CART summary.  CART is limited in various ways such as it has the tendency to become more complex and the over fit problem.

S-1.  Example of a regression tree with parts explained.

Picture

The second multivariate technique applied was RandomForest.  RandomForest is similar to CART except it allows for predictions, rather than inspection of data (Hamann 2010). RandomForest generates a specified large amount of trees and gives an output of which variables were used the most.  This “importance” data from the analysis implies the importance of “individual predictor variables for classification” (Hamann 2010).  The higher the importance, the more the variable was used as a primary node for generating the regression trees.  Using the RandomForest analysis predictions were  made on variables like EC and satellite imagery zones (Hamann 2010).