Podcast: The vast majority of data exists beyond the realm of spreadsheets and databases.
In their book Data Science for Business, Foster Provost and Tom Fawcett make a bold claim: the vast majority of data exists beyond the realm of spreadsheets and databases. This may come as a surprise to many people, who are used to thinking of data as something that is neatly organized in rows and columns. However, the truth is that most data is unstructured and difficult to access.
What is unstructured data?
Unstructured data is any type of data that does not fit into a predefined structure. This can include text, images, videos, audio files, and sensor data. Unstructured data is often messy and difficult to analyze, but it can also be a valuable source of insights.
Why is unstructured data important?
Unstructured data is important because it contains a wealth of information that can be used to improve decision-making. For example, unstructured data can be used to:
* Identify trends and patterns
* Understand customer behavior
* Predict future events
* Develop new products and services
How can I access unstructured data?
There are a number of ways to access unstructured data. One common approach is to use data mining techniques. Data mining is a process of extracting knowledge from large amounts of data. Data mining techniques can be used to identify patterns and trends in unstructured data.
Another approach to accessing unstructured data is to use big data technologies. Big data technologies are designed to handle large volumes of data. Big data technologies can be used to store, process, and analyze unstructured data.
Conclusion
The vast majority of data exists beyond the realm of spreadsheets and databases. Unstructured data is often messy and difficult to analyze, but it can also be a valuable source of insights. There are a number of ways to access unstructured data, including data mining techniques and big data technologies.
Kind regards N. Bauer.