

Discover more from dataDataGoose
Often, when dealing with real world data, values fail to align perfectly with a normal distribution. When outliers are present in datasets, the shape of its distribution can distort and skew in the direction of the outliers.
Skewness is the result of data that is asymmetric about its mean. This statistical phenomenon is most clearly demonstrated when plotted as a distribution, compared against the shape of a normal distribution.
With a non-skewed set of values, data follows a normal distribution, where points both left and right of the mean follow a symmetrical and even shape. Skewed data, however, has the effect of elongating and stretching the distribution on one side of the mean, creating a ‘long-tailed‘ graph.
This long-tailed exaggeration of a graphs shape can exist on both halves of the mean: termed negative-skew when left of the mean and positive-skew when right of it. Further, a symmetrical distribution (a normal distribution) has no skew.
What is negative and positive skew?
Positive-skew exists when there are extremities in the higher values, often in cases where a majority of values fall around a mean with a small number of values being strong outliers - such as household incomes, average house prices, etc.
A datasets skewness can be easily visualised using boxplots, where stretched boxes and whiskers highlight skewness overtly.
Negative-skew, also called left-skew, exists when outliers are present in the lower values of the dataset.
Is skewness a bad thing?
A dataset isn’t by virtue of skewness flawed; skewness is common in many datasets, and can be measured and calculated using formula to determine its coefficient.
If you have any more questions about skewness, feel free to leave a comment below, or email me. Also, if you liked this post, please consider subscribing to dataDataGoose, or sharing this post with a friend!