As a data scientist, it is important to have a deep understanding of statistics. Here, I introduce basic statistical concepts and quantities.
Types of measurements and variables
Important statistical concepts include the following:
Types of measurement scales
Nomenclature for variables: dependent vs independent variables
Statistical quantities
You should definitely know about the following, frequently used statistical quantities:
Centrality measures: mean and median, mode
Measure of dispersion: standard deviation, variance, covariance, interquartile-range
Interval estimates: confidence intervals
Probability distributions
Commonly occuring probability distributions are:
Humans are visual creatures. Thus, visualization is one of the most important tools for conveying information and data scientists should be adapt at selecting appropriate visualizations.
Which plot is appropriate?
Choosing an appropriate plot for a given set of data can be hard because there are so many types of plots such as scatter plots, box plots, and histograms. Fortunately, I have created an overview of the most important plots, when they are appropriate, and how they can be used in R.
Need a holiday from data science? Then this page is for you because this category encompasses all the posts that are not directly associated with data science. Until now, these posts have mostly dealt with blogging with Hugo but let’s see what the future brings. Anyway, I don’t plan to stray too far away from the intended focus of the blog, so there should never be too many posts under this category.
Using statistical tests, it is possible to make a statement about the significance of a set of measurements by calculating a test statistic. If it is unlikely to obtain a test statistic at least as extreme as the observed value, then the result is significant. For example, at a significance level of 5%, the probability of a false positive test result would be bounded by roughly 5%.
Parametric vs non-parametric tests
There is a multitude of tests for determining statistical significance.