Column Summary - Table of Statistical Information

The column summary or column metadata is a table that contains several statistical and non-statistical information about the variables of the dataset. This summary is about to give a wholesome picture of each column and helps to explore and understand the given dataset including finding outliers, min-max values, dirty data, variation column type, or standard deviation.

Column Summary

Try AnswerMiner

The following descriptive statistical techniques are automatically calculated by AnswerMiner and represented on the Column Summary table.

Description

The description of a column can be a piece of detailed information that helps others to understand what is that specific column about. Sometimes the name of the column speaks for itself, but especially for surveys, abbreviations are used in column names.

Distribution

There is a visualization for each column in the data view and also in the column summary. By that, you can see visually how the values are dispersed.

Type

AnswerMiner automatically detects the types of columns and based on that suggests visualizations. The following types of columns are used in AnswerMiner: Numeric ID, Unique ID, Logical, Integer, Category, Ordinal, Date, Date-time.

Unit

As the description gives additional information to a column the measurement unit of the data (currency, amount, weight, etc.) is also useful to give an understanding for others, who are reading the analysis. It is also a standard in statistics that the measurement data is included somehow in the header. The measurement unit of data that you can configure in Column settings at Data view.

Full range info

The full range info - as the name describes - includes information about all the values included in the given column. In the case of numerical values, the minimum and the maximum values will be displayed. Otherwise, all the categories or date range or unique strings will be shown.

General range info

The general range info shows the most frequent usual values in the given columns, based on the same logic as the Full-range info does.

Values with percentage

This column shows each value appearing in the given column and their frequency in percentage.

Filled Count

This number shows the number of non-empty cells in the given column. This and the filled ratio includes essential information in the exploratory phase.

Filled Ratio

The filled ratio is the percentage of filled cells compared to all cells inside the column or it is often named as the quality of the column.

Empty Count

This shows the number of empty cells as reverse information of the Filled Count column.

Numeric Count

The number of numeric values in a column. This number shows you how many different numeric values are there. If it is less than the number of rows, there are duplicated values or empty cells in your dataset.

Numeric Ratio

This is the percentage of the numeric values compared to all non-empty cells

Distinct Count

This is the number of different values in a column. It can quickly help you find out how many elements are in a categorical column.

Mean

The mean or the average is the sum of numeric values divided by the number of numeric values.

Median

The median is the middle number of the group when they are ranked in order.

Mode (most common value)

The mode is the most frequently occurring value.

Mode count

This number shows the count of the most frequently occurring value - the mode.

Mode ratio

The mode ratio is the percentage of the most frequently occurring value compared to all non-empty cells.

The table is configurable by sorting the values and add/hide columns and rows. See on the image below.

Add or remove columns

Try AnswerMiner

Min

The min or minimum value is the lowest numeric value in a set of numeric values.

Max

The max or maximum value is the highest numeric value in a set of numeric values.

Min to Max

The lowest (min or minimum) and the highest (max or maximum) numeric value in a set of numeric values.

Standard deviation

The sample standard deviation (SD) is a measure of the amount of variation or dispersion of a set of values (square root of variance).

Variance

The sample variance is the expectation of the squared deviation of a random variable from its mean (square of standard deviation).

Population Standard deviation

This is the standard deviation or SD o of the entire population.

Population Variance

This is the variance or VAR of the entire population.

Usual Range Low

This is the lower or bottom 10% in a set of numeric values if the numbers are ranked in order.

Usual Range High

This is the upper or top 10% in a set of numeric values if the numbers are ranked in order.

Usual Range (middle 80%)

This is the middle 80% of values, which means that the bottom and top 10% are excluded.

The table can be exported as a TSV. It will include all the custom settings like filtering, sorting and included columns and rows set in AnswerMiner.

Export table

IQR Low

The IQR Low is the lower value of interquartile range (middle 50% of numbers if ranked in order), lower 25% of numbers.

IQR High

The IQR High is the upper value of interquartile range (middle 50% of numbers if ranked in order), upper 25% of numbers.

IQR Range (middle 50%)

The interquartile range or IQR (middle 50% of numbers if ranked in order) is between 25th and 75th percentiles.

Outlier-free Range Low

The Outlier-free Range Low is the lowerest number that is not an outlier: max(Q1-3*IQR, MaxOfNumbers) where Q1=IQRLow.

Outlier-free Range High

The Outlier-free Range High is the highest number that is not an outlier: min(Q3+3*IQR, MaxOfNumbers) where Q3=IQRHigh.

Outlier-free Range

The Outlier-free Range is the full range of numeric values without far outliers (3*IQR rule).

This feature is available in each plan of AnswerMiner. Feel free to try.

Try AnswerMiner

Column Summary - Table of Statistical Information

Description

Distribution

Type

Unit

Full range info

General range info

Values with percentage

Filled Count

Filled Ratio

Empty Count

Numeric Count

Numeric Ratio

Distinct Count

Mean

Median

Mode (most common value)

Mode count

Mode ratio

Min

Max

Min to Max

Standard deviation

Variance

Population Standard deviation

Population Variance

Usual Range Low

Usual Range High

Usual Range (middle 80%)

IQR Low

IQR High

IQR Range (middle 50%)

Outlier-free Range Low

Outlier-free Range High

Outlier-free Range

Suggested & Automatic Charts

How to do survey analysis like a PRO