Member-only story
Learn Machine Learning and Splunk
2 min readMay 31, 2020
Splunk is one of the most successful packages for Cybersecurity analytics, and defines seven main elements for machine learning (Figure 1):
- Preprocessing: This defines how the data is scaled to produce the correct range (such as for numerical values to be scaled to a given range). A typical method is StandardScalar.
- Feature Extraction: This defines a method to extract key features that are required for the machine to learn on. Typical methods are PCA (Principle Component Analysis) and TFIDF.
- Analysing data: This involves analysing the correlations between data. Typical methods include ACF (autocorrelation factors) and PACF (partial autocorrelation factors).
- Classification: This involves classifying data into groups. Typical methods include: SVM and RandomForestClassifier.
- Group events: This normally involves clustering. Kmeans and BIRCH are typical methods.
- Detection of outliers: This defines anomalies within the data sets, and be used in anomaly detection. A typical method is OneClassSVM.
- Prediction: This makes predictions on the data given a set of known inputs, and can either be numerical predictions (such as using linear regression, random forest regression, lasso, and decision tree regression) or categorical (such as with…