Splunk is one of the most successful packages for Cybersecurity analytics, and defines seven main elements for machine learning (Figure 1):
- Preprocessing: This defines how the data is scaled to produce the correct range (such as for numerical values to be scaled to a given range). A typical method is StandardScalar.
- Feature Extraction: This defines a method to extract key features that are required for the machine to learn on. Typical methods are PCA (Principle Component Analysis) and TFIDF.
- Analysing data: This involves analysing the correlations between data. Typical methods include ACF (autocorrelation factors) and PACF (partial autocorrelation factors).
- Classification: This involves classifying data into groups. Typical methods include: SVM and RandomForestClassifier.
- Group events: This normally involves clustering. Kmeans and BIRCH are typical methods.
- Detection of outliers: This defines anomalies within the data sets, and be used in anomaly detection. A typical method is OneClassSVM.
- Prediction: This makes predictions on the data given a set of known inputs, and can either be numerical predictions (such as using linear regression, random forest regression, lasso, and decision tree regression) or categorical (such as with logistic regression).
- Forecasting: This defines a method to predict future data values from the history of the data. Typical methods are ARIMA (Autoregressive integrated moving average) and KalmanFilter.
Figure 1: Machine Learning Ref: https://docs.splunk.com/images/2/20/Machine-learning-quick-ref-guide.pdf
Here is a tutorial: