Everybody collects,
interprets and uses information, much of it in numerical or statistical forms
in day-to-day life. It is a common practice that people receive large
quantities of information everyday through conversations, televisions,
computers, the radios, newspapers, posters, notices and instructions. It is
just because there is so much information available that people need to be able
to absorb, select and reject it. In everyday life, in business and industry,
certain statistical information is necessary and it is independent to know
where to find it how to collect it. As consequences, everybody has to compare
prices and quality before making any decision about what goods to buy. As employees of any firm, people want to
compare their salaries and working conditions, promotion opportunities and so
on. In time the firms on their part want
to control costs and expand their profits.
One of the main
functions of statistics is to provide information which will help on making
decisions. Statistics provides the type of information by providing a
description of the present, a profile of the past and an estimate of the
future. The following are some of the objectives of collecting statistical
information.
1.
To describe the methods of collecting primary statistical information.
2.
To consider the status involved in carrying out a survey.
3.
To analyse the process involved in observation and interpreting.
4.
To define and describe sampling.
5.
To analyse the basis of sampling.
6.
To describe a variety of sampling methods.
Statistical
investigation is a comprehensive and requires systematic collection of data
about some group of people or objects, describing and organizing the data,
analyzing the data with the help of different statistical method, summarizing
the analysis and using these results for making judgements, decisions and
predictions. The validity and accuracy
of final judgement is most crucial and depends heavily on how well the data was
collected in the first place. The
quality of data will greatly affect the conditions and hence at most importance
must be given to this process and every possible precaution should be taken to
ensure accuracy while collecting the data.
“Data is any group of observation or measurement related to
the area of a business interest and to be used for decision making.”
Nature
of data:
It may be noted that
different types of data can be collected for different purposes. The data can
be collected in connection with time or geographical location or in connection
with time and location. The following
are the three types of data:
1.
Time series data.
2.
Spatial data
3.
Spacio-temporal data.
1. Time series data:
It is a collection of
a set of numerical values, collected over a period of time. The data might have
been collected either at regular intervals of time or irregular intervals of
time.
Example 1:
The following is the
data for the three types of expenditures in rupees for a family for the four
years 2001,2002,2003,2004.
Year
|
Food
|
Education
|
Others
|
Total
|
2001
|
3000
|
2000
|
3000
|
8000
|
2002
|
3500
|
3000
|
4000
|
10500
|
2003
|
4000
|
3500
|
5000
|
12500
|
2004
|
4500
|
5000
|
6000
|
16000
|
2. Spatial Data:
If the data collected
is connected with that of a place, then it is termed as spatial data. For
example, the data may be
i)
Number of runs scored by a batsman in
different test matches in a test series at different places
ii)
District wise rainfall in Tamilnadu
iii) Prices
of silver in four metropolitan cities
Example 2:
The population of the
southern states of India in 1991.
State
|
Population
|
Tamilnadu
|
5,56,38,318
|
Andhra
Pradesh
|
6,63,04,854
|
Karnataka
|
4,48,17,398
|
Kerala
|
2,90,11,237
|
Pondicherry
|
7,89,416
|
If the data collected
is connected to the time as well as place then it is known as spacio temporal
data.
3. Spacio
Temporal Data:
Example
3:
State
|
Population
|
|
1981
|
1991
|
|
Tamilnadu
|
4,82,97,456
|
5,56,38,318
|
Andhra Pradesh
|
5,34,03,619
|
6,63,04,854
|
Karnataka
|
3,70,43,451
|
4,48,17,398
|
Kerala
|
2,54,03,217
|
2,90,11,237
|
Pondicherry
|
6,04,136
|
7,89,416
|
Any statistical data
can be classified under two categories depending upon the sources utilized. These categories are,
Categories
of data:
1. Primary data 2. Secondary data
1. Primary
data:
Primary data is the
one, which is collected by the investigator himself for the purpose of a
specific inquiry or study. Such data is original in character and is generated
by survey conducted by individuals or research institution or any organisation.
Example
4:
If a researcher is
interested to know the impact of noon-meal scheme for the school children, he
has to undertake a survey and collect data on the opinion of parents and
children by asking relevant questions. Such a data collected for the purpose is
called primary data.
The
primary data can be collected by the following five methods.
i)
Direct personal interviews.
ii)
Indirect Oral interviews.
iii)
Information from correspondents.
iv)
Mailed questionnaire method.
v)
Schedules sent through enumerators.
2. Secondary Data:
Secondary data are
those data which have been already collected and analysed by some earlier
agency for its own use; and later the same data are used by a different
agency. According to W. A. Neiswanger, ‘
A primary source is a publication in which the data are published by the same
authority which gathered and analysed them.
A secondary source is a publication, reporting the data which have been
gathered by other authorities and for which others are responsible’.
Sources
of Secondary data:
In most of the
studies the investigator finds it impracticable to collect first-hand
information on all related issues and as such he makes use of the data
collected by others. There is a vast
amount of published information from which statistical studies may be made and
fresh statistics are constantly in a state of production.
The
sources of secondary data can broadly be classified under two heads:
i)
Published sources, and
ii)
Unpublished sources.
Classification
of Data:
The collected data,
also known as raw data or ungrouped data are always in an un organised form and
need to be organised and presented in meaningful and readily comprehensible
form in order to facilitate further statistical analysis. It is, therefore, essential for an
investigator to condense a mass of data into more and more comprehensible and
assimilable form. The process of
grouping into different classes or sub classes according to some
characteristics is known as classification, tabulation is concerned with the
systematic arrangement and presentation of classified data. Thus classification
is the first step in tabulation.
For Example, letters
in the post office are classified according to their destinations viz.,
Delhi, Madurai, Bangalore, Mumbai etc.,
Objects of
Classification:
The following are
main objectives of classifying the data:
i)
It condenses the mass of data in an
easily assimilable form.
ii)
It eliminates unnecessary details.
iii)
It facilitates comparison and highlights
the significant aspect of data.
iv)
It enables one to get a mental picture
of the information and helps in drawing inferences.
v)
It helps in the statistical treatment of
the information collected.
Types
of Classification:
Statistical data are
classified in respect of their characteristics. Broadly there are four basic
types of classification namely
a) Chronological
classification
b) Geographical
classification
c) Qualitative
classification
d) Quantitative
classification
a)
Chronological classification:
In chronological
classification the collected data are arranged according to the order of time
expressed in years, months, weeks, etc.
The data is generally classified in ascending order of time. For example,
the data related with population, sales of a firm, imports and exports of a
country are always subjected to chronological classification.
Example 5:
The
estimates of birth rates in India during 1970 – 76 are
Year
|
1970
|
1971
|
1972
|
1973
|
1974
|
1975
|
1976
|
Birth Rate
|
36.8
|
36.9
|
36.6
|
34.6
|
34.5
|
35.2
|
34.2
|
In this type of
classification the data are classified according to geographical region or
place. For instance, the production of paddy in different states in India,
production of wheat in different countries etc.
b)
Geographical classification:
Example 6:
Country
|
America
|
China
|
Denmark
|
France
|
India
|
Yield of wheat in (kg/acre)
|
1925
|
893
|
225
|
439
|
862
|
In
this type of classification data are classified on the basis of same attributes
or quality like sex, literacy, religion, employment etc., Such attributes
cannot be measured along with a scale.
c) Qualitative classification:
For example, if the
population to be classified in respect to one attribute say sex, then we can
classify them into two namely that of males and females. Similarly, they can
also be classified into ‘employed’ or ‘unemployed’ on the basis of another
attribute ‘employment’.
Thus
when the classification is done with respect to one attribute, which is
dichotomous in nature, two classes are formed, one possessing the attribute and
the other not possessing the attribute. This type of classification is called
simple or dichotomous classification.
A simple
classification may be shown as:
The classification, where two or more
attributes are considered and several classes are formed, is called a manifold
classification. For example, if we
classify population simultaneously with respect to two attributes, e.g sex and
employment, then population are first classified with respect to ‘sex’ into ‘ males’ and ‘ females’ . Each of these
classes may then be further classified into ‘employment’ and ‘ unemployment’ on
the basis of attribute ‘ employment’ and as such Population are classified into
four classes namely.
i)
Male employed
ii)
Male unemployed
iii)
Female employed
iv)
Female unemployed
Still the
classification may be further extended by considering other attributes like
marital status etc. This can be explained by the following chart:
d) Quantitative classification:
Quantitative
classification refers to the classification of data according to some
characteristics that can be measured such as height, weight, etc., For example
the students of a college may be classified according to weight as given below.
Weight
(in lbs)
|
No of Students
|
90-100
|
50
|
100-110
|
200
|
110-120
|
260
|
120-130
|
360
|
130-140
|
90
|
140-150
|
40
|
Total
|
1000
|
In this type of
classification there are two elements, namely (i) the variable (i.e) the weight
in the above example, and (ii) the frequency in the number of students in each
class. There are 50 students having weights ranging from 90 to 100 lb, 200
students having weight ranging between 100 to 110 lb and so on.
Tabulation:
Tabulation is the
process of summarising classified or grouped data in the form of a table so
that it is easily understood and an investigator is quickly able to locate the
desired information. A table is a systematic arrangement of classified data in
columns and rows. Thus, a statistical table makes it possible for the
investigator to present a huge mass of data in a detailed and orderly form. It
facilitates comparison and often reveals certain patterns in data which are
otherwise not obvious. Classification and ‘Tabulation’, as a matter of fact,
are not two distinct processes. Actually they go together. Before tabulation data are classified and
then displayed under different columns and rows of a table.
Advantages of Tabulation:
Statistical data
arranged in a tabular form serve following objectives:
i)
It simplifies complex data and the data
presented are easily understood.
ii)
It facilitates comparison of related
facts.
iii)
It facilitates computation of various
statistical measures like averages, dispersion, correlation etc.
iv)
It presents facts in minimum possible
space and unnecessary repetitions and explanations are avoided. Moreover, the
needed information can be easily located.
v)
Tabulated data are good for references
and they make it easier to present the information in the form of graphs and
diagrams.
Preparing
a Table:
The making of a
compact table itself an art. This should contain all the information needed
within the smallest possible space. What the purpose of tabulation is and how
the tabulated information is to be used are the main points to be kept in mind
while preparing for a statistical table. An ideal table should consist of the
following main parts:
1.
Table number 2.
Title of the table
3.
Captions or column headings 4.
Stubs or row designation
5.
Body of the table 6.
Footnotes
7. Sources
of data
Table Number: A
table should be numbered for easy reference and identification. This number, if
possible, should be written in the centre at the top of the table. Sometimes it
is also written just before the title of the table.
Title: A
good table should have a clearly worded, brief but unambiguous title explaining
the nature of data contained in the table. It should also state arrangement of
data and the period covered. The title should be placed centrally on the top of
a table just below the table number (or just after table number in the same
line).
Captions or column Headings: Captions
in a table stand for brief and self explanatory headings of vertical columns.
Captions may involve headings and sub-headings as well. The unit of data
contained should also be given for each column. Usually, a relatively less
important and shorter classification should be tabulated in the columns.
Stubs or Row Designations: Stubs
stands for brief and self explanatory headings of horizontal rows. Normally, a
relatively more important classification is given in rows. Also a variable with
a large number of classes is usually represented in rows. For example, rows may
stand for score of classes and columns for data related to sex of students. In
the process, there will be many rows for scores classes but only two columns
for male and female students.
A
model structure of a table is given below:
<Table Number> <Title of the Table>
Sub Heading
|
Caption Headings
|
Total
|
Caption
Sub-Headings
|
||
Stub Sub-Headings
|
Body
|
|
Total
|
|
|
Foot notes:
Sources Note:
|
Body: The
body of the table contains the numerical information of frequency of
observations in the different cells. This arrangement of data is according to
the description of captions and stubs.
Footnotes: Footnotes
are given at the foot of the table for explanation of any fact or information
included in the table which needs some explanation. Thus, they are meant for
explaining or providing further details about the data, that have not been
covered in title, captions and stubs.
Sources
of data:
Lastly one should
also mention the source of information from which data are taken. This may
preferably include the name of the author, volume, page and the year of
publication. This should also state whether the data contained in the table is
of ‘primary or secondary’ nature.
Requirements of a Good
Table:
A good statistical
table is not merely a careless grouping of columns and rows but should be such
that it summarizes the total information in an easily accessible form in
minimum possible space. Thus while preparing a table, one must have a clear
idea of the information to be presented, the facts to be compared and he points
to be stressed.
Though, there is no
hard and fast rule for forming a table yet a few general point should be kept
in mind:
1. A table should be formed in keeping with the
objects of statistical enquiry.
2. A table should be carefully prepared so that it
is easily understandable.
3. A table should be formed so as to suit the size
of the paper. But such an adjustment should not be at the cost of legibility.
4. If the figures in the table are large, they
should be suitably rounded or approximated. The method of approximation and
units of measurements too should be specified.
5. Rows and columns in a table should be numbered
and certain figures to be stressed may be put in ‘box’ or ‘circle’ or in bold
letters.
6. The arrangements of rows and columns should be
in a logical and systematic order. This arrangement may be alphabetical,
chronological or according to size.
7. The rows and columns are separated by single,
double or thick lines to represent various classes and sub-classes used. The
corresponding proportions or percentages should be given in adjoining rows and
columns to enable comparison. A vertical expansion of the table is generally
more convenient than the horizontal one.
8. The averages or totals of different rows should
be given at the right of the table and that of columns at the bottom of the
table. Totals for every sub-class too should be mentioned.
9. In case it is not possible to accommodate all
the information in a single table, it is better to have two or more related
tables.
Type
of Tables:
Tables can be
classified according to their purpose, stage of enquiry, nature of data or
number of characteristics used. On the basis of the number of characteristics,
tables may be classified as follows:
1. Simple or one-way table 2. Two way table
3. Manifold table
1.
Simple or one-way Table:
A simple or one-way
table is the simplest table which contains data of one characteristic
only. A simple table is easy to
construct and simple to follow. For
example, the blank table given below may be used to show the number of adults
in different occupations in a locality.
The number of adults in different
occupations in a locality
Occupations
|
No. Of Adults
|
|
|
Total
|
|
A table, which
contains data on two characteristics, is called a two-way table. In such case,
therefore, either stub or caption is divided into two co-ordinate parts. In the
given table, as an example the caption may be further divided in respect of
‘sex’. This subdivision is shown in two-way table, which now contains two
characteristics namely, occupation and sex.
2. Two-way Table:
The umber of adults in a locality in
respect of occupation and sex
Occupation
|
No. of Adults
|
Total
|
|
Male
|
Female
|
||
|
|
|
|
Total
|
|
|
|
Thus, more and more
complex tables can be formed by including other characteristics. For example,
we may further classify the caption sub-headings in the above table in respect
of “marital status”, “religion” and “socio-economic status” etc. A table, which
has more than two characteristics of data is considered as a manifold table.
For instance, table shown below shows three characteristics namely occupation,
sex and marital status.
3. Manifold Table:
Occupation
|
No. of Adults
|
Total
|
|||||
Male
|
Female
|
||||||
M
|
U
|
Total
|
M
|
U
|
Total
|
||
|
|
|
|
|
|
|
|
Total
|
|
|
|
||||
Foot note: M Stands for Married and U
stands for unmarried.
|
Manifold tables,
though complex are good in practice as these enable full information to be
incorporated and facilitate analysis of all related facts. Still, as a normal
practice, not more than four characteristics should be represented in one table
to avoid confusion. Other related tables may be formed to show the remaining
characteristics
No comments:
Post a Comment