Statistics for Financial Engineering — Part 2
Read More about Part — 1 here
Key concepts in statistics:
- Descriptive Statistics: Descriptive statistics involves summarizing and presenting data in a meaningful way. Measures such as mean, median, mode, range, and standard deviation are commonly used to describe the central tendency and variability of a dataset.
- Inferential Statistics: Inferential statistics involves making predictions or inferences about a population based on a sample of data. This includes hypothesis testing, confidence intervals, and regression analysis.
What is Central tendency?
Central tendency is a statistical measure that indicates the central or typical value around which a set of data points cluster. It provides a summary of the location of the data and helps to describe the center or average of a distribution. The three main measures of central tendency are:
- Mean (Arithmetic Mean): Calculated by summing all the values in a dataset and dividing by the number of observations. It is sensitive to extreme values (outliers) and is appropriate for interval and ratio data.
- Median: The middle value of a dataset when it is ordered. If there is an even number of observations, the median is the average of the two middle values. It is less affected by extreme values and is suitable for ordinal, interval, and ratio data.
- Mode: The value that occurs most frequently in a dataset. A dataset may have one mode (unimodal), more than one mode (multimodal), or no mode at all. It is suitable for nominal, ordinal, interval, and ratio data.
Each measure has its advantages and is appropriate in different situations. The choice of which measure to use depends on the characteristics of the data and the goals of the analysis.
Measure of asymmetry
The measure of asymmetry in a statistical distribution is called skewness. Skewness quantifies the degree and direction of skew (departure from horizontal symmetry) in a dataset. There are two main types of skewness:
- Positive Skewness (Right Skew): The distribution’s tail is elongated to the right, and the majority of the data points are concentrated on the left side. The mean is typically greater than the median.
- Negative Skewness (Left Skew): The distribution’s tail is elongated to the left, and the majority of the data points are concentrated on the right side. The mean is typically less than the median.
Measures of Variability
Measures of variability, also known as measures of dispersion, quantify the extent to which individual data points in a dataset vary from the central tendency. These measures provide insights into the spread, distribution, and overall variability of the data. Common measures of variability include:
- Variance: The average of the squared differences between each data point and the mean of the dataset. It gives a measure of the overall spread but is in squared units.
- Standard Deviation: The square root of the variance. It is widely used as a measure of the average distance between each data point and the mean. The standard deviation is in the same units as the original data.
- Coefficient of Variation (CV): The ratio of the standard deviation to the mean, expressed as a percentage. It provides a relative measure of variability, making it useful for comparing variability between datasets with different units.
Measure the relationship between variables
The relationship between variables can be measured using various statistical measures, depending on the nature of the variables. Here are some common measures used to quantify relationships:
- Correlation Coefficient:
- Pearson Correlation Coefficient (r): Measures the linear relationship between two continuous variables. It ranges from -1 to 1, where -1 indicates a perfect negative linear relationship, 1 indicates a perfect positive linear relationship, and 0 indicates no linear relationship.
- Spearman Rank Correlation Coefficient: Measures the strength and direction of the monotonic relationship between two variables, regardless of the linearity. It is suitable for ordinal or non-normally distributed data.
2. Covariance: Measures the degree to which two variables change together. Positive covariance indicates a positive relationship, negative covariance indicates a negative relationship and a covariance of zero suggests no linear relationship. However, the scale of covariance is not standardized, making it challenging to interpret.
What is distribution:
A distribution is a function that shows the possible values for a variable and how often they occur. Different types of distribution are:
Normal Distribution /Gaussian Distribution: A specific type of distribution is characterized by a symmetric bell-shaped curve. In a normal distribution, the mean, median, and mode are all equal, and a large proportion of the data falls within one or two standard deviations of the mean.
Standard Normal Distribution: The standard normal distribution, also known as the Z distribution or the Z-score distribution, is a specific type of normal distribution with a mean of 0 and a standard deviation of 1. It is a standardized form of the normal distribution that allows for the comparison of values from different normal distributions.
The probability density function (PDF) of the standard normal distribution is denoted by the Greek letter phi (φ). The standard normal variable is typically represented by the letter Z.
Central Limit Theorem
The Central Limit Theorem (CLT) is a fundamental concept in statistics that describes the distribution of sample means, regardless of the original distribution of the population. It states that, for a sufficiently large sample size, the distribution of the sample mean will be approximately normally distributed, regardless of the shape of the original population distribution.
Standard Error
The standard error (SE) is a measure of the variability or precision of a sample statistic, such as the sample mean. It provides an estimate of how much the sample statistic is likely to vary from the true population parameter.
Stay tuned for Part — 3