Categories

As a data scientist, it is important to have a deep understanding of statistics. Here, I introduce basic statistical concepts and quantities. Types of measurements and variables Important statistical concepts include the following: Types of measurement scales Nomenclature for variables: dependent vs independent variables Statistical quantities You should definitely know about the following, frequently used statistical quantities: Centrality measures: mean and median, mode Measure of dispersion: standard deviation, variance, covariance, interquartile-range Interval estimates: confidence intervals Probability distributions Commonly occuring probability distributions are:

In this section of the blog, I discuss topics related to data science, AI, and academia. Posts on software engineering

Humans are visual creatures. Thus, visualization is one of the most important tools for conveying information and data scientists should be adapt at selecting appropriate visualizations. Which plot is appropriate? Choosing an appropriate plot for a given set of data can be hard because there are so many types of plots such as scatter plots, box plots, and histograms. Fortunately, I have created an overview of the most important plots, when they are appropriate, and how they can be used in R.

Need a holiday from data science? Then this page is for you because this category encompasses all the posts that are not directly associated with data science. Until now, these posts have mostly dealt with blogging with Hugo but let’s see what the future brings. Anyway, I don’t plan to stray too far away from the intended focus of the blog, so there should never be too many posts under this category.

Posts on software engineering

Using statistical tests, it is possible to make a statement about the significance of a set of measurements by calculating a test statistic. If it is unlikely to obtain a test statistic at least as extreme as the observed value, then the result is significant. For example, at a significance level of 5%, the probability of a false positive test result would be bounded by roughly 5%. Parametric vs non-parametric tests There is a multitude of tests for determining statistical significance.

Categories

Basic Statistical Concepts for Data Science

Commentary

Data Visualization

Other

Software Engineering

Statistical Significance Tests