# CAIIB Paper-1 Module A Unit 1 : Definition of Statistics, Importance & Limitations & Data Collection, Classification & Tabulation

## CAIIB Paper 1 (ABM) Module A Unit 1: Definition of Statistics, Importance & Limitations & Data Collection, Classification & Tabulation (New Syllabus)

IIBF has released the New Syllabus Exam Pattern for CAIIB Exam 2023. Following the format of the current exam, CAIIB 2023 will have now four papers. The CAIIB Paper 1 (Advanced Bank Management) includes an important topic called “Definition of Statistics, Importance & Limitations & Data Collection, Classification & Tabulation”. Every candidate who are appearing for the CAIIB Certification Examination 2023 must understand each unit included in the syllabus.

In this article, we are going to cover all the necessary details of CAIIB Paper 1 (ABM) Module A (Statistics ) Unit 1 : Definition of Statistics, Importance & Limitations & Data Collection, Classification & Tabulation, Aspirants must go through this article to better understand the topic, Definition of Statistics, Importance & Limitations & Data Collection, Classification & Tabulation and practice using our Online Mock Test Series to strengthen their knowledge of Banker Customer Relationship. Unit 1: Definition of Statistics, Importance & Limitations & Data Collection, Classification & Tabulation

### Introduction

The word ‘Statistics’ has been derived from the

• Latin word ‘statisticum’,
• Italian word ‘statistia’
• German word ‘statistik’,
• Each of which means a group of numbers or figures that represent some information of human interest.
• First used by professor Achen well in 1749 to refer to the subject-matter as a whole.
• Achen well defined statistics as the Political Science of many countries.
• In the early years statistics is to be used only by the kings to collect facts about the state, revenue of the state or the people in the state of administrative or political purpose.
• Gradually the use of statistics which means data or information has increased and widened.
• It is now used in almost in all the fields of human knowledge and skills like Business, Commerce, Economics, Social Sciences, Politics, Planning, Medicine and other sciences, physical as well as natural.
• In many practical situations in life, we come across different types of data which are needed to be understood, analysed, compared and interpreted correctly.
• For example, in a college we need to analyse the data of marks obtained, in a hospital we need to analyse the data of number of patients having different diseases, rate of mortality, Different types of data need to be analysed in Economics, Government and Private organisations, Sports and in many other fields.

Statistical analysis of data can be comprised of four distinct phases:

• Collection of data: In this first stage of investigation, numerical data is collected from different published or unpublished sources, primary or secondary.
• Classification and Tabulation of data: The raw data collected is to be represented properly for further calculations. The raw data is divided into different groups or classes and represented in a form of a table.
• Analysis of data: Classified and Tabulated data is analysed using different formulas and methods according to purpose of the study or investigation.
• Interpretation of data: At the final stage, relevant conclusions are drawn after the data is thoroughly analysed

### Importance Of Statistics

Statistics is the subject that teaches how to deal with data, so statistical knowledge helps to use proper methods for collection of data, properly represent the data, use appropriate formula and methods to analyse correctly and effectively get the results and interpret the data. Applications of Statistics is important in every sphere of field – Business and economics, Medical, Sports, Weather forecast, Stock Market, Quality Testing, Government decisions and policies, Banks, Different educational and research organisations, etc.

• In Business, the decision maker takes suitable policies and strategies based on information on production, sale, profit, purchase, finance, etc.
• By using the techniques of time series analysis, the businessman can predict the effect of a large number of variables with a fair degree of accuracy.
• By using ‘Bayesian Decision Theory’, the businessmen can select the optimal decisions to directly evaluate the payoff for each alternative course of action.
• In Economics, Statistics is used to analyse demand, cost, price, quantity, different laws of demand like elasticity of demand and consumer’s maximum satisfaction which is determined on the basis of data pertaining to income and expenditure.

#### Medical

• Statistics have extensive application in clinical research and medical field. Clinical research involves investigating proposed medical treatments, assessing the relative benefits of competing therapies, and establishing optimal treatment combinations.

#### Weather Forecast

• Statistical methods, like Regression techniques and Time series analysis, are used in weather forecasting.

#### Stock Market

• Statistical methods, like Correlation and Regression techniques, Time series analysis are used in forecasting stock prices. Return and Risk Analysis is used in calculation of Market and Personal Portfolios and Mutual Funds.

#### Bank

• In banking industry, credit policies are decided based on statistical analysis of profitability, demand deposits, time deposits, credit ratio, number of customers and many other ratios. The credit policies are based on the application of probability theory.

#### Sports

• Players use statistics to identify or rectify their mistakes. A proper understanding of the statistics determines the success of a team or a single athlete.

### Function Of Statistics

• Statistics present the facts in definite form.
• Statistics simplify complex data.
• It provides a techniques of comparison.
• Statistics study the relationship between two or more variables.
• It helps in formulating policies.
• It helps in forecasting outcomes.

### Limitations Or Demerits Of Statistics

• Statistics do not deal with Individuals: Statistical methods can’t be applied for individual values of the observations as for individual observation, there is no point of comparing anything or analysing anything. Statistics is the study of mass data or a group of observations and deals with aggregates of facts.
• Statistics does not study Qualitative Data: Statistical methods can’t be applied for qualitative or non-numerical data. Statistics is the study of only of those facts which are capable of being stated in number or quantity.
• Statistics give Result only on an Average: Statistical methods are not exact. Generally, when we have large number of observations, it becomes difficult to handle it. A part of the data (sample) is collected for study and draw conclusion from, as a representative for the whole. As a result, the result obtained are not exactly same, had we analysed the whole data. The results are true only on an average in the long run.
• The results can be biased: The data collection may sometime be biased which will make the whole investigation useless. Generally, this situation arises when data is handled by inexperienced or dishonest person.

### Definitions

#### Population

It is the entire collection of observations (person, animal, plant or things which is actually studied by a researcher) from which we may collect data. It is the entire group we are interested in and from which we need to draw conclusions.

Example: If we are studying the weight of adult men in India, the population is the set of weights of all men in India.

Data can be classified into two types, based on their characteristics.

• Variates: A characteristic that varies from one individual to another and can be expressed in numerical terms is called variate. Example: Prices of a given commodity, wages of workers, heights and weights of students in a class, marks of students, etc.
• Attributes: A characteristic that varies from one individual to another but can’t be expressed in numerical terms is called an attribute. Example: Colour of the ball (black, blue, green, etc.), religion of human, etc.

### Collection Of Data

Researchers or investigators need to collect data from respondents. There are two types of data.

#### Primary Data

Primary data is the data which is collected directly or first time by the investigator or researcher from the respondents. Primary data is collected by using the following methods:

• Direct Interview Method: A face to face contact is made with the informants or respondents (persons from whom the information is to be obtained) under this method of collecting data. The interviewer asks them questions pertaining to the survey and collects the desired information.
• Questionnaires: Questionnaires are survey instruments containing short closed-ended questions (multiple choice) or broad open-ended questions. Questionnaires are used to collect data from a large group of subjects on a specific topic. Currently, many questionnaires are developed and administered online.

#### Census and sample survey

• In a census, data about all individual units (e.g., people or households) are collected in the population. In a survey, data are only collected for a sub-part of the population; this part is called a sample.
• These data are then used to estimate the characteristics of the whole population. In this case, it has to be ensured that the sample is representative of the population in question. For example, the proportion of people below the age of 18 or the proportion of women and men in the selected sample of households has to reflect the reality in the total population.

Secondary Data

• Secondary data are the Second hand information. The data which have already been collected and processed by some agency or persons and is collected for the second time are termed as secondary data.
• According to M. M. Blair, “Secondary data are those already in existence and which have been collected for some other purpose.” Secondary data may be collected from existing records, different published or unpublished sources, like WHO, UNESCO, LIC, etc., various research and educational organisations, banks and financial places, magazines, internet, etc.

#### Distinction between primary and secondary data

• The data collected for the first time is called Primary data and data collected through some published or unpublished sources is called Secondary data.
• The primary data in the hands of one person can become secondary for all others. For example, the population census report is primary for the Registrar General of India and the information from the report is secondary for others.
• Primary data are original as they are collected first time from the respondents directly or by preparing questionnaires. So they are more accurate than the secondary data. But the collection of primary data requires more money, time and energy than the secondary data. A proper choice between the two forms of information should be made in an enquiry.

### Classification and Tabulation

So, we learned about the different methods of collecting primary and secondary data. The raw data, collected in real situations are arranged randomly, haphazardly and sometimes the data size is very large.  Thus, the raw data do not give any clear picture and interpreting and drawing any conclusion becomes very difficult. To make the data understandable, comparable and to locate similarities, the next step is classification of data. The method of arranging data into homogeneous group or classes according to some common characteristics present in the data is called Classification.

Example: The process of sorting letters in a post office, the letters are classified according to the cities and further arranged according to the streets.  Classification condenses the data by removing unimportant details. It enables us to accommodate large number of observations into few classes and study the relationship between several characteristics.  Classified data is presented in a more organised way so it is easier to interpret and compare them, which is known as Tabulation.

There are four important bases of classifications:

• Qualitative Base: Here the data is classified according to some quality or attribute such as sex, religion, literacy, intelligence, etc.
• Quantitative Base: Here the data is classified according to some quantitative characteristic like height, weight, age, income, marks, etc.
• Geographical Base: Here the data is classified by geographical regions or location, like states, cities, countries, etc. like population in different states of India.
• Chronological or Temporal Base: Here the data is classified or arranged by their time of occurrence, such as years, months, weeks, days, etc. This classification is also called Time Series data.

Example: Sales of a company for different years.

#### Types of Classification

• If we classify observed data for a single characteristic, it is known as One-way Classification. Ex: Population can be classified by Religion – Hindu, Muslim, Christians, etc.
• If we consider two characteristics at a time to classify the observed data, it is known as a Two-way classification. Ex: Population can be classified according to Religion and sex.
• If we consider more than two characteristics at a time in order to classify the observed data, it is known as Multi-way Classification. Ex: Population can be classified by Religion, sex and literacy.

### Frequency Distribution

#### Frequency

• If the value of a variable (discrete or continuous) e.g., height, weight, income, etc. occurs twice or more in a given series of observations, then the number of occurrences of the value is termed as the “frequency” of that value.
• The way of representing a data in a form of a table consisting of the values of the variable with the corresponding frequencies is called “frequency distribution”.
• So, in other words, Frequency distribution is a table used to organise the data.
• The left column (called classes or groups) includes numerical intervals on a variable under study.
• The right column contains the list of frequencies, or number of occurrences of each class/group.
• Croxton and Cowden defined frequency distribution as a statistical table which shows the sets of all distinct values of the variable arranged in order of magnitude, either individually or in groups with their corresponding frequencies side by side Intervals are normally of equal size covering the sample observations range.

Class-limits or Class Intervals

• A class is formed within the two values, class-limits or class-intervals. The lower value is called lower class limit or lower-class interval and the upper value is called upper class limit or class interval.

Class Length or Class Width

• The difference between the class’upper and lower class limit is called the length or the width of class.

Class Length = Class Width = Upper Class Interval – Lower Class Interval

Mid-Value or Class Mark

• The mid-point of the class is called mid-value or class mark.

Class Mark = (Lower class-limit + Upper Class limit)/2

Types of Class Intervals

• Exclusive type,
• Inclusive type

Exclusive type Class intervals like

• 0–10, 10–20; 500–1000, 1000–1500 are called exclusive types.
• Here the upper limits of the classes are excluded from the respective classes and put in the next class while considering the frequency of the respective class.
• For example, the value 15 is excluded from the class 10–15 and put in the class 15–20.

Inclusive type Class intervals

• 60–69, 70–79, 80–89, etc. are inclusive type.
• Here both the lower and upper class limits are included in the class-intervals while considering the frequency of the respective class,
• g., 60 and 69 are both included in the class 60–69.

Class Boundaries

Inclusive classes can be converted to exclusive classes and the new class intervals are called class boundaries.

Example : The classes 5–9, 10–14 can be converted to exclusive type of classes using the formula → New UCI = Old UCI + (10 – 9)/2 = 9 + 0.5 = 9.5. New LCI = Old LCI – (10 – 9)/2 = 5 – 0.5 =  4.5. So the class-boundaries are 4.5–9.5, 9.5–14.5, etc.

Open-end Class Interval

In open-end class interval either the lower limit of the first class or upper limit of the last class or both are missing.

Example:

Below 10

10–20

20–30

30–40

Above 40

Relative Frequency = frequency /Total frequency

Example: Relative frequency of the class interval = 20–30 in Example 2 is 12/32 = 0.375

Percentage Frequency

Percentage Frequency = (Class frequency/Total Frequency) × 100

Example: Percentage frequency of the class interval = 20–30 in Example 2 is (12/32) 100 = 37.5.

Frequency Density

Frequency density of a class interval = Class frequency/Width of Class

#### Continuous Frequency Distribution:

• Variable takes values which are expressed in class intervals within certain limits.

Problem: Marks obtained by 20 students in an exam for 50 marks are given below–convert the data into continuous frequency distribution form.

18, 23, 28, 29, 44, 28, 48, 33, 32, 43, 24, 29, 32, 39, 49, 42, 27, 33, 28, 29.

Problem: Following data reveals information about the number of children per family for 25 families. Prepare frequency distribution of number of children

(say variable x, taking distinct values 0, 1, 2, 3, 4).

3 2 1 1 2

4 0 1 2 3

1 2 0 4 2

2 1 2 3 2

1 3 4 0 1

Solution:

PDF-CAIIB Paper 1 (ABM) Module A Unit 1-Definition of Statistics, Importance & Limitations & Data Collection, Classification & Tabulation (Ambitious_Baba)