Dark Data Explained

Dark Data Explained

Ahmed Banafa 01/04/2021 3
Dark Data Explained

According to Gartner, dark data is data which is acquired through information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).

Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.

Dark data is a type of unstructured, un-tagged and untapped data that is found in data repositories and has not been analyzed or processed. It is similar to big data which is large and complex unstructured data (images posted on Facebook, email, text messages, GPS signals from mobile phones, tweets, Tick Tok videos, Snaps, Instagram pictures, and other social media updates, etc.) that cannot be processed by traditional database tools, but dark data differs in how it is mostly neglected by business and IT administrators in terms of its value.

Dark data is also known as dusty data.

Dark data is data that is found in log files and data archives stored within large enterprise class data storage locations. It includes all data objects and types that have yet to be analyzed for any business or competitive intelligence or aid in business decision making. Typically, dark data is complex to analyze and stored in locations where analysis is difficult. The overall process can be costly. It also can include data objects that have not been seized by the enterprise or data that are external to the organization, such as data stored by partners or customers.

Up to 90 percent of big data is dark data.

Mining Dark Data

With the growing accumulation of structured, unstructured and semi-structured data in organizations -- increasingly through the adoption of big data applications -- dark data has come specially to denote operational data that is left un-analyzed. Such data is seen as an economic opportunity for companies if they can take advantage of it to drive new revenues or reduce internal costs. Some examples of data that is often left dark include server log files that can give clues to website visitor behavior, customer call detail records that can indicate consumer sentiment and mobile Geo-location data that can reveal traffic patterns to aid in business planning.

Dark data may also be used to describe data that can no longer be accessed because it has been stored on devices that have become obsolete.

Types of Dark Data

1) Data that is not currently being collected.

2) Data that is being collected, but that is difficult to access at the right time and place.

3) Data that is collected and available, but that has not yet been productized, or fully applied.

Dark data, unlike dark matter which is a form of matter thought to account for approximately 85% of the matter and composed of particles that do not absorb, reflect, or emit light, so they cannot be detected by observing electromagnetic radiation, dark data can be brought to light and so can its potential ROI. And what’s more, a simple way of thinking about what to do with the data –- through a cost-benefit analysis –- can remove the complexity surrounding the previously mysterious dark data.

Value of Dark Data

The primary challenge presented by dark data is not just storing it, but determining its real value, if any at all. In fact, much dark data remains un-illuminated because organizations simply don’t know what it contains. Destroying it might be too risky, but analyzing it can be costly. And it’s hard to justify that expense if the potential value of the data is unknown. To determine if their dark data is even worth further analysis, organizations need a means of quickly and cost effectively sorting, structuring, and visualizing it. Important fact in getting a handle on dark data is to understand that it isn’t a one-time event.

The first step to understand the value of dark data is identifying what information is included in your dark data, where it resides, and its current status in terms of accuracy, age, and so on. Getting to this state will require you to:

  • Analyze the data to understand the basics, such as how much you have, where it resides, and how many types (structured, unstructured, semi-structured) are present.
  • Categorize the data to begin understanding how much of what types you have, and the general nature of information included in those types, such as format, age, etc.
  • Classify your information according to what will happen to it next. Will it be archived? Destroyed? Studied further? Once those decisions have been made, you can send your data groups to their various homes to isolate the information that you want to explore further.

Once you’ve identified the relative context for your data groups, now you can focus on the data you think might provide insights. You’ll also have a clearer picture of the full data landscape relative to your organization so that you can set information governance policies that will alleviate the burden of dark data, while also putting it to work.

Future of Dark Data

What a Degree in Public Health Taught me about Data Science

Startups going after dark data problems are usually not playing in existing markets with customers self-aware of their problems. They are creating new markets by surfacing new kinds of data and creating un-imagined applications with that data. But when they succeed, they become big companies, ironically, with big data problems.

The question many people are asking is: What should be done with dark data? Some say data should never be thrown away, as storage is so cheap, and that data may have a purpose in the future.

Share this article

Leave your comments

Post comment as a guest

terms and condition.
  • Patrick Clark

    Dark data reminds us to have a consideration of preliminary analysis in dataset.

  • Ross Smith

    Although data scientists have lots of methods like machine learning to reduce bias, they should still be careful.

  • Liam Mainwaring

    Thanks for the explanation

Share this article

Ahmed Banafa

Tech Expert

Ahmed Banafa is an expert in new tech with appearances on ABC, NBC , CBS, FOX TV and radio stations. He served as a professor, academic advisor and coordinator at well-known American universities and colleges. His researches are featured on Forbes, MIT Technology Review, ComputerWorld and Techonomy. He published over 100 articles about the internet of things, blockchain, artificial intelligence, cloud computing and big data. His research papers are used in many patents, numerous thesis and conferences. He is also a guest speaker at international technology conferences. He is the recipient of several awards, including Distinguished Tenured Staff Award, Instructor of the year and Certificate of Honor from the City and County of San Francisco. Ahmed studied cyber security at Harvard University. He is the author of the book: Secure and Smart Internet of Things Using Blockchain and AI

Cookies user prefences
We use cookies to ensure you to get the best experience on our website. If you decline the use of cookies, this website may not function as expected.
Accept all
Decline all
Read more
Tools used to analyze the data to measure the effectiveness of a website and to understand how it works.
Google Analytics