Basic Statistical Concepts for Data Science

Basic statistics

As a data scientist, it is important to have a deep understanding of statistics. Here, I introduce basic statistical concepts and quantities.

Types of measurements and variables

Important statistical concepts include the following:

  • Types of measurement scales
  • Nomenclature for variables: dependent vs independent variables

Statistical quantities

You should definitely know about the following, frequently used statistical quantities:

  • Centrality measures: mean and median, mode
  • Measure of dispersion: standard deviation, variance, covariance, interquartile-range
  • Interval estimates: confidence intervals

Probability distributions

Commonly occuring probability distributions are:

  • Uniform distribution: all values are equally likely
  • Normal distribution: a bell-shaped curve, typical for many population characteristics (e.g. IQs, heights)
  • Poisson distribution: an integer distribution that is ideal for count data
  • Exponential distribution: a heavy-tailed distribution

Posts on basic statistics

You can find eplanations of basic statistical concepts and their use in R in the following posts.