I did a review of a research paper, and in it, the authors casually used a standard deviation value. But, the data itself was not following a standard deviation, and so the metric was useless. There was no analysis as to whether the data matched a normal distribution. In places, in research, I do detect a lack of understanding of good old maths for data distributions, and where some just reach for the latest k-means methods and care little about actually using scientific methods on the data.
For our data, we can put our data into a number of bins and then plot these for the number of values that hit the bins. This gives us the probability of occurrence within a bin range. In these plots we have random values, normal distributions, and other distributions:
If our data distribution can match these, we can then define the data values in terms of a data distribution function. In the case of a normal distribution we have:
And where mu is the average value, and sigma is the standard deviation value. We can use Python to produce the data distribution samples and plot these curves [here]:
# https://asecuritysite.com/comms/dist
import…