Data is the new gold. Data is the new oil. In the past years, big data has become very popular, and now in 2016 small data is the “new big data.” In data science, both are very important. However, we can read many different opinions on this topic. In this article, we will compare big data and small data and examine both sides of the coin.
What Are Big Data and Small Data?
Big Data definition:
Big data is a term for data sets that are so large or complex that traditional data processing applications cannot deal with them. Accuracy in big data may lead to more confident decision-making, and better decisions can result in greater operational efficiency, cost reduction, and risk reduction.
Small Data definition:
Small data is a dataset that contains very specific attributes. Small data is used to determine current states and conditions or may be generated by analyzing larger data sets (Big Data)
Big Data vs. Small Data
The main difference between big data and small data is the size of the dataset we analyze. The difference may be clear to everyone, but there are many other fields where we can differentiate between big data and small data. On the comparison table below, you will find five categories. The source volume, variety, velocity and value.
Source: https://datafloq.com/read/small-data-vs-big-data-back-to-the-basic/706
The Trap of Small Data
The most common problem with big data is the size of the database. We have too much unorganized data and the data cleaning needs a lot of time. Moreover, we need different tools to analyze and visualize our data. Therefore, the small data seems to be easier and faster to analyze. However, there is a trap. When we have little data (compared to big data), we cannot ignore coincidences. So sometimes, analyzing small data can be more difficult than big data. To see the certainty, we need to use an A/B test. It will show us the real difference between the two cases.
Enlarge Small Data
In the case of big data, we have enough data to see even the smallest correlations in our dataset. In small data, we do not have this opportunity, but there is a way to know more about our data. We can make a big data from small data. There are several areas we can use to broaden our dataset:
- Name → Gender
- City → Population
- Date → Weekend Yes/No
- Birthdate → Zodiac sign
- Address → Real estate market
Small Data from Big Data
As we enlarge our small data, we can filter and aggregate our big data. As we wrote at the beginning in the article, big data is a huge amount of information. First of all, we need to store and manage that data. Nowadays it is almost free, so most companies save all of their data. They may not know what to use it for right now, but they will need it later. When they have the dataset, they will need some tools to discover the relationships in big data. They need to find relevant groups and types of data. Afterward, they can create smaller and smaller datasets, giving them small data.
Summary
All in all, there are many advantages and disadvantages to big data and small data. We need to recognize our needs, and possibilities and decide what kind of information we need, and what we would like to discover from our data. After that, we can decide between big data and small data, or we can use both. Big data and small data are equally important.