Member-only story

Learn Machine Learning and Splunk

2 min readMay 31, 2020

Splunk is one of the most successful packages for Cybersecurity analytics, and defines seven main elements for machine learning (Figure 1):

Preprocessing: This defines how the data is scaled to produce the correct range (such as for numerical values to be scaled to a given range). A typical method is StandardScalar.
Feature Extraction: This defines a method to extract key features that are required for the machine to learn on. Typical methods are PCA (Principle Component Analysis) and TFIDF.
Analysing data: This involves analysing the correlations between data. Typical methods include ACF (autocorrelation factors) and PACF (partial autocorrelation factors).
Classification: This involves classifying data into groups. Typical methods include: SVM and RandomForestClassifier.
Group events: This normally involves clustering. Kmeans and BIRCH are typical methods.
Detection of outliers: This defines anomalies within the data sets, and be used in anomaly detection. A typical method is OneClassSVM.
Prediction: This makes predictions on the data given a set of known inputs, and can either be numerical predictions (such as using linear regression, random forest regression, lasso, and decision tree regression) or categorical (such as with…

Written by Prof Bill Buchanan OBE FRSE