# Learn Machine Learing and Splunk

Splunk is one of the most successful packages for Cybersecurity analytics, and defines seven main elements for machine learning (Figure 1):

**Preprocessing**: This defines how the data is scaled to produce the correct range (such as for numerical values to be scaled to a given range). A typical method is StandardScalar.**Feature Extraction**: This defines a method to extract key features that are required for the machine to learn on. Typical methods are PCA (Principle Component Analysis) and TFIDF.**Analysing data**: This involves analysing the correlations between data. Typical methods include ACF (autocorrelation factors) and PACF (partial autocorrelation factors).**Classification**: This involves classifying data into groups. Typical methods include: SVM and RandomForestClassifier.**Group events**: This normally involves clustering. Kmeans and BIRCH are typical methods.**Detection of outliers**: This defines anomalies within the data sets, and be used in anomaly detection. A typical method is OneClassSVM.**Prediction**: This makes predictions on the data given a set of known inputs, and can either be numerical predictions (such as using linear regression, random forest regression, lasso, and decision tree regression) or categorical (such as with logistic regression).**Forecasting**: This defines a method to predict future data values from the history of the data. Typical methods are ARIMA (Autoregressive integrated moving average) and KalmanFilter.

Figure 1: Machine Learning Ref: https://docs.splunk.com/images/2/20/Machine-learning-quick-ref-guide.pdf

Here is a tutorial: