What Big Data REALLY Is (And Isn’t)

Big Data… We’ve all heard of it at this point, this particular buzz-word has managed to penetrate to every level of business. Yet when you ask anyone in the business (even many in the IT, Technology, and Data spaces) “What is big data?” you get responses like:

  • “It’s data, that is really… Big… Takes up lots of space”
  • “It’s data from Twitter, and Facebook, and all these social media things”  
  • “It’s images, video, and other big files”
  • “I don’t have a clue”

It’s this new exciting thing that everyone seems to be doing, but nobody can really articulate a firm understanding of.
Ultimately there is a lack of understanding of Big Data, and as a result the business perceives great amount of risks around it. And that’s an unfortunate barrier to success in your business initiatives in Big Data.
So what IS Big Data really? Let me start by highlighting some major myths about big data, with some clarification around each, then I’ll get into a bit more detail about what Big Data really is.


Myth #1 – Big Data = Hadoop

Hadoop is a platform for distributed processing of vast amounts of data, and it’s an exceptional tool at solving that particular problem. But even though some of the original pioneers of Big Data are involved in Hadoop (Google, for example created and has patented Map Reduce, which is the core technology that Apache Hadoop implements), that particular technology is not the definition of Big Data.

Big Data is made up of a vast ecosystem of tools that cover a wide array of problems (many of which Hadoop is not suited to solving). Any organization seeking to venture down the Big Data path needs to keep an open view of all of the potential tools in this wide array of solutions, to avoid enacting the “if your only tool is a hammer then everything looks like a nail” proverb.

Myth #2 – Big Data Creates a Silo, or Replaces Current Solutions

Big Data is first and foremost about integration. I can’t stress this enough.
Big Data derives the majority of its value from combining your data from all your diverse data-sets in order to create a “more than the sum of its parts” outcome – allowing your organization to derive significant value in the form of insights, or analysis that weren’t possible before.

In this way Big Data should NEVER be treated as a silo, or replacement to existing solutions, instead it is about integration of your various data sources. A Big Data team should integrate, and collaborate closely with your existing Database, Data Warehouse, Business Intelligence teams as well as the business users, executives, marketing, data consumers and many other areas of the business.

Big Data solutions are inherently scalable on multiple architectures. As a result it’s usually possible to (at least initially) re-use or augment existing investments to get an initial proof of concept in Big Data.

As you grow, and once you’re deriving real value, at that point you can scale your organization’s Big Data footprint into a more purpose built architecture which can accommodate your soon to be booming demand for Big Data.

Myth #3 – Big Data is Only About Unstructured or Social Data

These data types definitely present new challenges to processing and analyzing data, but that is far from being the only use of Big Data – or even a primary use!

Social data is an excellent source of information that flows rapidly from the public – it’s free and can be used to derive great value. Unstructured data is similar to social data, in that it presents complex challenges to analysis, and in some cases it’s also large in volume or scale (video or high resolution medical imaging data as a for example). While these are definitely common applications for Big Data, you must keep a pragmatic lense on your own organizational needs and circumstances and consider all the possible areas that Big Data could be leveraged, without constraining yourself to unstructured data simply because the most media buzz is focused there.

Trying to consume a broad wealth of social data or massive amounts of unstructured data, and derive some meaningful value from it can be a large challenge, and is not always the best first project for an organization.
Ultimately Big Data is a massive, multi-faceted swiss-army-knife, which can be leveraged in a wide array of circumstances to solve a vast array of problems for your organization.

Myth #4 – Big Data Always Means a Massive Long Term Project

While it can certainly be true that in some cases Big Data projects commonly end up sprawling out, consuming vast amounts of infrastructure, resources, and time – this is not always the case if a careful measured approach is used in initially assessing your needs.

One of the most common problems I see in organizations attempting to undertake Big Data initiatives, is that they take the first word in the name too literally… And they start BIG.

It’s very important not to try and “boil the ocean”. Carefully assess your needs, engage a partner that knows the Big Data ecosystem well and who can provide a holistic assessment of your needs as they weigh against the available technology. Identify a simple, limited scope project which can easily be achieved and have its success effectively measured. Keep your scale/scope small, attack low-hanging fruit, and try to achieve the smallest possible footprint while delivering real business value as rapidly as possible. In this model you can quickly prove the value of a Big Data initiative, without getting lost in a sprawling never-ending project. Then you can worry about what’s next, and take a “one step at a time” measured approach to your long-term Big Data goals.

Myth #5 – I Need To Choose One Vendor

Aligning yourself to one particular vendor, or a particular technology stack is dangerous in the Big Data market as it exists today. This is due to a number of reasons:

Big Data is a large ecosystem. No one stack yet covers ALL the bases
Big Data as a market is very young. Many vendors aren’t fully mature yet. I anticipate a major round of VC, new ventures, acquisitions, and mergers in this market in the coming 5-6 years… As a result there could be instability in tightly aligning to one vendor
Very few System Integration Partners have a mature practice in any one vendor – most are staying “higher level” (keeping to the generalized concepts, general skills, and open source software) or they are aligned with ALL vendors
Ultimately Big Data is new, and it covers potentially hundreds of solutions & technologies, many of which are unique, and address a very particular problem set.
Don’t think of it as a question of which technology to use but (as I mention in Myth #4) find a problem first, and then you can more effectively choose a solution to address that problem. The next problem may require an entirely different technology. Such is the nature of Big Data.

So… WHAT Is Big Data?

Hopefully at this point I’ve addressed some of the common misconceptions or assumptions about Big Data and in doing so painted a bit of a picture of what constitutes “Big Data”.

But to distil it down to a “definition” of sorts, I’ll provide the best definition I can come up with below.

“The Definition”

Big Data is a cluster of new problems arising from newly emerging trends in data creation in our culture. Big Data is a very broad ecosystem of relatively new technologies and tools targeted to solving these new problems by allowing the capture, consumption, processing and analysis of data in ways that may have been previously not feasible, or at least cost prohibitive because of a number of factors such as:

  • Data Volume
  • Data Velocity
  • Data Structure or Format
  • Data Complexity/Variety

Big Data consists of many technologies, applied in many ways to data in all shapes and sizes. It is primarily about combining and aggregating data in new and innovative ways in order to discover new emergent properties in the data (resulting in an outcome which is “greater than the sum of it’s parts”).
Big Data delivers immense value to a business or organization in the form of insights and analysis that are only possible through this integration and combination of data, and through the resulting complex relationships and patterns present in the combined data.


So there we have it. Hopefully I’ve helped shed a bit of light on the topic of Big Data, and how to define this monster of a topic!

In future posts, I will endeavour to explore some more details of the topic in depth.
If you have a Big Data challenge or you would like some assistance in exploring how Big Data could address some of your organization’s needs, please don’t hesitate to reach out to me or my team of Big Data experts at Eclipsys and we would be happy to discuss ways that we can help.


No Comments

No comments yet.

RSS feed for comments on this post.

Leave a comment