Measures of Central Tendency in Statistics

Measures of Central Tendency in Statistics

Edited By Komal Miglani | Updated on Jul 02, 2025 07:54 PM IST

In statistics, the central value of data is an important concept as it helps to summarize the data and describe the set with a single value like the mean. These provide better insights about data that cluster around a value. Understanding these concepts helps to solve complex problems more easily. These values describe the data in a better way and help the analyst to analyze the data in a better way and take out the insights from it. This is one of the fundamentals of statistics which has numerous applications in various domains like data analysis, weather forecast, business, etc.

Measures of Central Tendency in Statistics
Measures of Central Tendency in Statistics

This article is about the concept Measures of Central Tendency. This is an important concept which falls under the broader category of Statistics. This is not only important for board exams but also for various competitive exams.

Central Value of Data(Central Tendency)

A measure of central tendency (or central value) is a single value that attempts to describe a set of data by identifying the central position within that set of data. Apart from mean (often called the average), there are other central values such as the median and the mode.

The mean, median, and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others.

Mean

The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. If we have $n$ values in a data set, i.e. $x_1, x_2, x_3, \ldots, x_n$, then its mean, usually denoted by $\bar{x}$ (pronounced " $x$ bar"), is:

$
\bar{x}=\frac{x_1+x_2+\cdots+x_n}{n}
$

Applications of Mean:
1. Calculating average income or expenditure.
2. Analyzing trends and patterns.

For example, to calculate the mean weight of 50 people, add the 50 weights together and divide by 50. Technically this is the arithmetic mean.

Mean of the Ungrouped Data

If n observations in data are $\mathrm{x}_1, \mathrm{x}_2, \mathrm{x}_3, \ldots \ldots, \mathrm{x}_n$, then arithmetic mean $\bar{x}$ is given by

$
\bar{x}=\frac{x_1+x_2+x_3+\ldots \cdots+x_n}{n}=\frac{1}{n} \sum_\limits{i=1}^n x_i
$

Mean of Ungrouped Frequency Distribution

If observations in data are $x_1, x_2, x_3, \ldots \ldots, x_n$ with respective frequencies $f_1, f_2$, $f_3, \ldots \ldots, f_n$; then

Sum of the value of the observations $=f_1 x_1+f_2 x_2+f_3 x_3+\ldots \ldots .+f_n x_n$
and Number of observations $=f_1+f_2+f_3+\ldots .+f_n$
The mean in this case is given by

$\bar{x}=\frac{f_1 x_1+f_2 x_2+f_3 x_3+\ldots \ldots+f_n x_n}{f_1+f_2+f_3+\ldots \ldots+f_n}=\frac{\sum_\limits{i=1}^n f_i x_i}{\sum_\limits{i=1}^n f_i}$

Grouped Frequency Distribution

$x_i$ is taken as mid-point of respective classes (or interval). i.e.,

$
m=\frac{\text { lower boundary }+ \text { upper boundary }}{2}
$

then, $\bar{x}=\frac{\sum_\limits{i=1}^n f_i m_i}{\sum_\limits{i=1}^n f_i}$

For example,
A frequency table displaying professor's last statistic test is shown, the best estimate of the class mean is

$
\begin{array}{|c|c|}
\hline \text { Grade Interval } & \text { Number of Students } \\
\hline 10-12 & 1 \\
\hline 12-14 & 2 \\
\hline 14-16 & 0 \\
\hline 16-18 & 4 \\
\hline 18-20 & 1 \\
\hline
\end{array}
$

First find the midpoints for all intervals

$
\begin{array}{|c|c|}
\hline \text { Grade Interval } & \text { Midpoint } \\
\hline 10-12 & 11 \\
\hline 12-14 & 13 \\
\hline 14-16 & 15 \\
\hline 16-18 & 17 \\
\hline 18-20 & 19 \\
\hline
\end{array}
$

Now calculate the sum of the product of each interval frequency and midpoint,

$
\begin{aligned}
& \sum_{i=i}^n f_i m_i \\
& 11(1)+13(2)+15(0)+17(4)+19(1)=124 \\
& \bar{x}=\frac{\sum_\limits{i=1}^n f_i m_i}{\sum_\limits{i=1}^n f_i}=\frac{124}{8}=15.5
\end{aligned}
$

Median

The median is the middle value for a set of data that has been arranged in ascending or descending order.

It is a number that separates ordered data into 2 equal halves. Half the values are the same number or smaller than the median, and half the values are the same number or larger.

For example, to find the median of the following data

$\begin{array}{lllllllllll}65 & 55 & 89 & 56 & 35 & 14 & 56 & 55 & 87 & 45 & 92\end{array}$

We first rearrange that data into order (ascending)
$\begin{array}{lllllllllll}14 & 35 & 45 & 55 & 55 & 56 & 56 & 65 & 87 & 89 & 92\end{array}$
The median mark is the value exactly in the middle - in this case, 56
When the $n$ is even in the data set, then simply you have to take the middle two scores and average them.

Median helps do Income distribution analysis.

Median of Ungrouped Data

If the number of observations is $n$,
First arrange the observations in ascending or descending order.

If n is odd :

$
\text { Median }=\left(\frac{n+1}{2}\right)^{t h} \text { observation }
$

If n is even :

$
\text { Median }=\frac{\text { Value of }\left(\frac{n}{2}\right)^{t h} \text { observation }+ \text { Value of }\left(\frac{n}{2}+1\right)^{t h} \text { observation }}{2}
$

For example,

Consider the following data: $1 ; 11.5 ; 6 ; 7.2 ; 4 ; 8 ; 9 ; 10 ; 6.8 ; 8.3 ; 2 ; 2 ; 10 ; 1$
Ordered from smallest to largest: : $1 ; 1 ; 2 ; 2 ; 4 ; 6 ; 6.8 ; 7.2 ; 8 ; 8.3 ; 9 ; 10 ; 10 ; 11.5$
Since there are 14 observations, the median is average of $(\mathrm{n} / 2) \mathrm{th}=7$ th and $(\mathrm{n} / 2$ +1 )th $=8$ th term. So median is the average of 6.8 and 7.2 , which equals 7 .

The median is seven. Half of the values are smaller than seven and half of the values are larger than seven.

Median of Ungrouped Frequency Distribution

To find the median, first arrange the observations in ascending order. After this the cumulative frequencies are obtained.

Let the sum of frequencies is denoted by N .
Now if $N$ is odd, then identify the observation whose cumulative frequency equal to or just greater than $\frac{N+1}{2}$. This value of the observation lies in the middle of the data and therefore, it is the required median.

If $N$ is even, then find two observations, first whose cumulative frequency equal to or just greater than (N/2) and second whose cumulative frequency equal to or just greater than $(\mathrm{N} / 2+1)$. The median is the average of these two observations

Median of Continuous Frequency Distribution

In this case, the following formula can be used when observations arranged in ascending order

$
\text { Median }=l+\frac{\left(\frac{N}{2}-c f\right)}{f} \times h
$

where,
I = lower limit of median class,
$\mathrm{N}=$ number of observations,
cf = cumulative frequency of class preceding the median class,
$f=$ frequency of median class,
$\mathrm{h}=$ class size (width) (assuming class size to be equal).

Mode

The mode is the most frequent value in our data set.

Normally, the mode is used for categorical data where we wish to know which is the most common category,

$
\begin{array}{llllllllllll}
65 & 55 & 89 & 56 & 35 & 14 & 56 & 55 & 87 & 45 & 92 & 55
\end{array}
$

in the above case, the mode of the data set is 55.

Mode is useful in Market research.

Mode is that value among the observations which occurs most often, that is, the value of the observation having the maximum frequency.

In a grouped frequency distribution, it is not possible to determine the mode by looking at the frequencies. Here, we can only locate a class with the maximum frequency, called the modal class. The mode is a value inside the modal class, and is given by the formula:

Mode $=l+\left(\frac{f_1-f_0}{2 f_1-f_0-f_2}\right) \times h$
where
I = lower limit of the modal class,
$\mathrm{h}=$ size of the class interval (assuming all class sizes to be equal),
$\mathrm{f}_1=$ frequency of the modal class,
$\mathrm{f}_0=$ frequency of the class preceding the modal class,
$\mathrm{f}_2=$ frequency of the class succeeding the modal class.

Recommended Video Based on Central Value of Data

Solved Examples Based On Central Value Of Data

Example 1: The median of the items $6,10,4,3,9,11,22,18$ is

1) $9$

2) $10$

3) $9.5$

4) $11$

Solution

Measure of location - A measure of location or a measure of central tendency helps us to know the average character of the data under study by a Single quantity.

Let s arrange the items in ascending order $3,4,6,9,9,10,11,18,22$.
In this data, the number of items is $\mathbf{n}=8$, which is even.
Median $=\mathrm{M}=$ average of $\left(\frac{n}{2}\right)$ th and $\left(\frac{n}{2}+1\right)$ th terms.
$=$ Average of $\left(\frac{8}{2}\right)$ th and $\left(\frac{8}{2}+1\right)$ th terms
$=$ Average of $4^{\text {th }}$ and $5^{\text {th }}$ terms
$
=\frac{9+10}{2}=\frac{19}{2}=9.5
$

Hence, the answer is option 3.

Example 2: In a class of $100$ students there are $70$ boys whose average marks in a subject are $75$. If the average marks of the complete class is $72$, then what is the average of the girls?

1) $73$

2) $65$

3) $68$

4) $74$

Solution

$\begin{aligned} & \frac{\sum_\limits{i=1}^{75} x_i}{70}=75 \\ & \Rightarrow \frac{S_B}{70}=75 \\ & S_B=5250 \\ & \text { Also } \\ & \qquad \frac{\sum_\limits{i=1}^{100} x_i}{100}=72 \\ & \Rightarrow \frac{S_T}{100}=72 \\ & S_T=7200 \\ & \Rightarrow S_G=7200-5250 \\ & \quad=1950\end{aligned}$

$
\begin{aligned}
&\text { Thus, it gives us the mean marks for girls }\\
&\begin{aligned}
& =\frac{1950}{30} \\
& =65
\end{aligned}
\end{aligned}
$

Hence, the correct option is option (2).

Example 3: The mean of $5$ observations is $5$ and their variance is $124$. If three of the observations are $1, 2$ and $6$ ; then the mean deviation from the mean of the data is :

1) $2.4$

2) $2.8$

3) $2.5$

4) $2.6$

Solution

Initially, we need to look at the following concepts:

Arithmetic Mean -

$
\begin{aligned}
&\text { For the values } x_1, x_2, \ldots x_n \text { of the variant } x \text { the arithmetic mean is given by }\\
&\bar{x}=\frac{x_1+x_2+x_3+\cdots+x_n}{n}
\end{aligned}
$

In case of discrete data,

Mean Deviation -

If $x_1, x_2, \ldots x_n$ are $n$ observations then the mean deviation from the point $A$ is given by :

$
\frac{1}{n} \sum\left|x_i-A\right|
$
Variance -

In case of discrete data,

$\sigma^2=\left(\frac{\sum x_i^2}{n}\right)-\left(\frac{\sum x_i}{n}\right)^2$

Now,

$\begin{aligned} & \frac{\sum x_i}{5}=5 \Rightarrow \sum x_i=25 \\ & \frac{\sum x_i^2}{n}-\left(\frac{\sum x_i}{n}\right)^2=124 \\ & \frac{\sum x_i^2}{5}-25=124 \\ & \sum x_i^2=149 \times 5=745\end{aligned}$

Let the two observations be $a \& b$

$
\begin{aligned}
& a+b+1+2+6=25 \\
& a+b=16 \\
& a^2+b^2+1^2+2^2+6^2=745 \\
& a^2+b^2+1+4+36=745 \\
& a^2+b^2=704
\end{aligned}
$

$\begin{aligned} & \text { Mean deviation }=\frac{\sum\left|x_i-5\right|}{5}=\frac{\left|x_1-5\right|+\left|x_2-5\right|+8}{5} \\ & =\frac{8+\left|x_1-5\right|+\left|11-x_1\right|}{5}=\frac{8+6}{5}=2.8\end{aligned}$

Hence, the answer is the option 2.

Example 4: In a set of $2n$ distinct observations, each of the observations below the median of all the observations is increased by $5$ and each of the remaining observations is decreased by $3$. Then the mean of the new set of observations :

1) increases by $1$.

2) decreases by $1$.

3) decreases by $2$.

4) increases by $2$.

Solution

The observations are $x1 x2.................x2n $
New observations $=x1+5, x2+5 ..........................xn+5$

and $x_{n+1}-3, x_{n+2}-3 \cdots \cdots x_{2 n}-3$

$\begin{aligned} \int Q \bar{x}_{\text {new }} & =\frac{\sum x i+5 n-3 n}{2 n} \\ = & \frac{\sum x i}{2 n}+1 \\ = & \bar{x}_{\text {old }}+1\end{aligned}$

Hence, the answer is the option 1.

Example 5: All the students of a class performed poorly in Mathematics. The teacher decided to give grace marks of $10$ to each of the students. Which of the following statistical measures will not change even after the grace marks are given?

1) variance

2) mean

3) median

4) mode

Solution

Mean, Mode, and Median are the measures of central tendency. All of these change with change in any observation.

Variance is the measure of the scattering of data. It is a measure of dispersion which do not change if every given observation changes by the same amount.

The measures of central tendency will change, but not measures of dispersion.

So variance will not change.

Hence, the answer is the option (1).

Frequently Asked Questions (FAQs)

1. What are the measures of central tendency?

The valid measures of central tendency are mean, median and mode.

2. Define mean.

The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.

3. What is a median?

The median is the middle value for a set of data that has been arranged in ascending or descending order.

4. What is mode?

The mode is the most frequent value in our data set.

5. When do you use mean?

Mean is used to find the average value of the given data.

6. What are measures of central tendency and why are they important in statistics?
Measures of central tendency are statistical values that describe the center or typical value of a dataset. They are important because they provide a single representative value that summarizes large amounts of data, making it easier to understand and compare different datasets. The three main measures of central tendency are mean, median, and mode, each offering unique insights into the data's distribution.
7. How does the mean differ from the median, and when should you use each?
The mean is the average of all values in a dataset, while the median is the middle value when the data is ordered. Use the mean when data is symmetrically distributed and there are no extreme outliers. Use the median when data is skewed or contains outliers, as it's less affected by extreme values. The median is often preferred for income data or house prices, while the mean is commonly used for test scores or heights.
8. What is the mode, and why is it useful for categorical data?
The mode is the most frequently occurring value in a dataset. It's particularly useful for categorical data (non-numerical data) because mean and median can't be calculated for such data. For example, in a survey about favorite colors, the mode would indicate the most popular color choice. The mode can also be used for numerical data and is the only measure of central tendency that can have multiple values (bimodal, trimodal, etc.).
9. How can outliers affect measures of central tendency?
Outliers, which are extreme values in a dataset, can significantly impact measures of central tendency, especially the mean. The mean is sensitive to outliers because it takes into account every value in the calculation. Outliers can pull the mean towards them, potentially misrepresenting the typical value. The median is less affected by outliers, making it a more robust measure for skewed data. The mode is not affected by outliers unless they are the most frequent value.
10. What is the difference between population and sample measures of central tendency?
Population measures of central tendency describe the entire group being studied, while sample measures are calculated from a subset of the population. Population measures are denoted by Greek letters (μ for mean, η for median), while sample measures use Roman letters (x̄ for mean, M for median). Sample measures are estimates of population measures and are subject to sampling error. In practice, we often use sample measures because it's rarely possible to measure an entire population.
11. How do you calculate the weighted mean, and when is it useful?
The weighted mean is calculated by multiplying each value by its weight, summing these products, and dividing by the sum of the weights. It's useful when some data points are more important or representative than others. For example, in calculating a course grade, different assignments might have different weights. In a customer satisfaction survey, responses from frequent customers might be given more weight than those from occasional customers.
12. What is the geometric mean, and in what situations is it appropriate to use?
The geometric mean is the nth root of the product of n numbers. It's appropriate for data that are multiplicative in nature or when calculating average rates of change. For example, it's used to calculate average growth rates in finance, average bacterial growth rates in biology, or average percentage changes. The geometric mean is always less than or equal to the arithmetic mean, except when all values are identical.
13. How does skewness in data affect measures of central tendency?
Skewness in data affects the relationship between the mean, median, and mode. In a perfectly symmetrical distribution, all three measures are equal. In a positively skewed distribution (tail extends to the right), the mean is greater than the median, which is greater than the mode. In a negatively skewed distribution (tail extends to the left), the mode is greater than the median, which is greater than the mean. Understanding skewness helps in choosing the most appropriate measure of central tendency.
14. What is the concept of a trimmed mean, and why might it be used?
A trimmed mean is calculated by removing a certain percentage of the smallest and largest values before calculating the average of the remaining values. It's used to reduce the impact of outliers while still considering more data points than the median. For example, a 10% trimmed mean removes the top and bottom 10% of values. Trimmed means are often used in sports scoring (e.g., gymnastics) where extreme scores are discarded.
15. How can the mean be misleading when describing income distributions?
The mean can be misleading for income distributions because these are typically right-skewed, with a few very high incomes pulling the mean upward. This results in a mean that's higher than what most people actually earn. For example, if nine people earn $30,000 and one person earns $1,000,000, the mean income is $127,000, which doesn't represent the typical income well. In such cases, the median is often a better measure of central tendency as it's less affected by extreme values.
16. What is the relationship between the mean, median, and mode in a normal distribution?
In a perfectly normal distribution, the mean, median, and mode are all equal and located at the center of the distribution. This creates the characteristic bell-shaped curve. Understanding this relationship is crucial because many statistical techniques assume normality. In real-world data, exact normality is rare, but many datasets approximate a normal distribution, making this relationship a useful reference point.
17. How does sample size affect the reliability of measures of central tendency?
Generally, larger sample sizes lead to more reliable measures of central tendency. As sample size increases, the sample mean becomes a more precise estimate of the population mean (this is known as the law of large numbers). Similarly, the sample median and mode become more stable with larger samples. Small samples are more prone to random fluctuations and may not accurately represent the population, potentially leading to biased or unreliable measures of central tendency.
18. What is the concept of a "resistant" measure of central tendency?
A resistant measure of central tendency is one that is not easily affected by extreme values or outliers in the data. The median is considered a resistant measure because it only depends on the middle value(s) of the ordered data, regardless of how extreme the other values might be. In contrast, the mean is not resistant because it takes into account every value, including outliers. Understanding which measures are resistant helps in choosing the most appropriate statistic for different types of data distributions.
19. How can you use measures of central tendency to compare different datasets?
Measures of central tendency allow for quick comparisons between datasets by providing a single representative value for each. For example, you can compare the average test scores of different classes, or the median income across different regions. However, it's important to use the same measure for all datasets and to consider the spread of the data (using measures of dispersion) alongside central tendency. Comparing only central tendencies can sometimes be misleading if the datasets have very different distributions.
20. What is the midrange, and how does it relate to other measures of central tendency?
The midrange is the average of the maximum and minimum values in a dataset. It's calculated as (max + min) / 2. While simple to compute, the midrange is rarely used as a primary measure of central tendency because it's extremely sensitive to outliers, using only the two most extreme values. It can be useful in some contexts, such as in quality control to quickly estimate the center of a process's range, but it's generally less informative than the mean, median, or mode.
21. How do you interpret the mode when dealing with continuous data?
For continuous data, the strict definition of mode (most frequent value) often isn't useful because exact values rarely repeat. Instead, we typically group the data into intervals and find the interval with the highest frequency. This is sometimes called the modal class. Alternatively, we can use the concept of the modal interval, which is the smallest interval containing the most data points. In some cases, kernel density estimation is used to identify the peak(s) of the probability density function, which can be interpreted as the mode(s) of continuous data.
22. What is the concept of a "robust" statistic, and how does it relate to measures of central tendency?
A robust statistic is one that performs well even when the underlying assumptions about the data (like normality) are not met, or when the data contains outliers. In terms of measures of central tendency, the median is considered robust because it's not significantly affected by extreme values or skewness in the data. The mean, on the other hand, is not robust because it can be heavily influenced by outliers. Understanding robustness is crucial when choosing appropriate statistical measures, especially when dealing with real-world data that often deviates from ideal conditions.
23. How can measures of central tendency be misleading in bimodal distributions?
In bimodal distributions, which have two distinct peaks, measures of central tendency can be misleading if used without context. The mean and median might fall between the two peaks, in an area with relatively few data points, thus not representing either of the typical values well. The mode could identify both peaks but wouldn't capture the overall shape of the distribution. In such cases, it's crucial to examine the entire distribution, perhaps using visualization techniques, rather than relying solely on summary statistics.
24. What is the harmonic mean, and in what situations is it appropriate?
The harmonic mean is the reciprocal of the arithmetic mean of reciprocals. It's calculated as n divided by the sum of reciprocals of the values. The harmonic mean is appropriate for averaging rates or speeds, especially when the same distance or time is involved. For example, if you drive 60 mph for one hour and 40 mph for another hour, your average speed is the harmonic mean (48 mph), not the arithmetic mean (50 mph). It's also used in finance for calculating average price-earnings ratios and in physics for calculating average resistances in parallel circuits.
25. How do measures of central tendency relate to the concept of "balance point" in a distribution?
The concept of a "balance point" in a distribution is closely related to measures of central tendency. The arithmetic mean can be thought of as the physical balance point of the distribution - if you were to plot the data on a number line and balance it on a fulcrum, the mean would be the balancing point. The median, in a sense, is the "middle" balance point, with equal numbers of data points on either side. Understanding this concept helps visualize what these measures represent and why they might differ in skewed distributions.
26. What is the impact of data transformation on measures of central tendency?
Data transformations, such as taking the logarithm or square root of values, can significantly affect measures of central tendency. For example, log transformation can make a right-skewed distribution more symmetric, potentially bringing the mean closer to the median. However, the mean of log-transformed data is not the same as the log of the original mean. Understanding these effects is crucial when working with transformed data, as it affects interpretation and can influence which measure of central tendency is most appropriate.
27. How do you choose the most appropriate measure of central tendency for a given dataset?
Choosing the most appropriate measure depends on several factors:
28. What is Simpson's Paradox, and how does it relate to measures of central tendency?
Simpson's Paradox occurs when a trend appears in different groups of data but disappears or reverses when these groups are combined. This paradox can significantly affect measures of central tendency. For example, the mean of a combined dataset might be higher than the means of its subgroups. This paradox highlights the importance of considering subgroup analysis and not relying solely on aggregate measures. It also emphasizes the need to understand the context and structure of the data when interpreting measures of central tendency.
29. How do measures of central tendency relate to the concept of expected value in probability theory?
The expected value in probability theory is closely related to the arithmetic mean in statistics. For a discrete random variable, the expected value is calculated by multiplying each possible value by its probability and summing these products. This is conceptually similar to calculating a weighted mean. In fact, for a large number of observations from a probability distribution, the sample mean tends to converge to the expected value (this is known as the law of large numbers). Understanding this connection helps bridge the concepts of descriptive statistics and probability theory.
30. What is the concept of a "population parameter" versus a "sample statistic" in the context of measures of central tendency?
Population parameters are the true values that describe the entire population, while sample statistics are estimates of these parameters based on a subset of the population. For measures of central tendency:
31. How can measures of central tendency be used in outlier detection?
Measures of central tendency, particularly when used in conjunction with measures of spread, can be useful for outlier detection:
32. What is the relationship between measures of central tendency and data visualization techniques?
Measures of central tendency are often incorporated into data visualization techniques to provide quick insights into the data's center:
33. How do measures of central tendency behave in the presence of censored or truncated data?
Censored data (where values beyond a certain point are imprecisely reported) and truncated data (where values beyond a certain point are completely missing) can significantly affect measures of central tendency:
34. What is the concept of a "trimean," and how does it relate to other measures of central tendency?
The trimean is a measure of central tendency calculated as (Q1 + 2*Median + Q3) / 4, where Q1 and Q3 are the first and third quartiles. It combines aspects of the median and the quartiles, making it more robust than the mean but more sensitive to the distribution than the median alone. The trimean can be thought of as a weighted average of the median (weight 2) and the two quartiles (weight 1 each). It's particularly useful for slightly skewed distributions, providing a balance between the robustness of the median and the sensitivity of the mean.
35. How do measures of central tendency relate to the concept of "typical value" in different contexts?
The concept of a "typical value" can vary depending on the context and the nature of the data:
36. How do measures of central tendency behave in multimodal distributions?
In multimodal distributions (those with multiple peaks), measures of central tendency can be challenging to interpret:

Articles

Back to top