|

Big Data on Small Budgets

By Daniel Smith, CMA
December 1, 2016
1 comments
Big data business concept ,info graphic vector illustration

Don’t let perceived costs and skills requirements prevent you from optimizing your management accounting toolkit.

 

Have you ever wondered why you can’t get several years of historical data instead of a paltry few months’ worth? Better data builds more solid business cases for new initiatives, but the traditional data warehouse is built on a single server. In other words, all the data is housed on one machine: one CPU, one motherboard, and a limited amount of storage.

 

Even if you get more room for storage, the cost goes up exponentially as storage size increases: If one unit of memory is $1, two units are $4, and 10 units are $1,000. But what about Big Data? You know what it is and what it can do. So would using it modernize your management accounting operations?

 

BIG DATA SOLUTIONS

Organizations of all sizes are using technology and methods that IT specialists place under the generic umbrella of Big Data. What we call Big Data is more than just the data itself; it’s an environment of different applications performing functions as simple as basic data processing and as complex as simultaneously performing machine-learning algorithms across hundreds of servers (for example, Spark ML).

 

A popular solution for many reasons, even the most basic Big Data environment provides the following functionality:

 

  • Transactional data storage (and analytics). Big Data scales with hardware costs. You can analyze petabytes of data at the same per-terabyte cost as a one-terabyte SQL instance. If you were to do the same thing with a traditional SQL database, the cost would be unimaginably high. It simply isn’t feasible.

 

With a Big Data solution, it’s possible to store copies of every bit of transactional data in perpetuity. For example, HADOOP (a software platform for storing and processing data across multiple servers, practically synonymous with Big Data) natively stores all information in triplicate and in different servers to ensure that a single hard-drive failure wouldn’t destroy any data.

 

  • Faster response time. Because storage space is much more abundant, most organizations elect to create aggregated, high-speed tables, as needed, on top of their transaction data. These tables provide long-term, time-series data and enable you to quickly create financial reports of up-to-the-minute business performance information.

 

  • Flexible data analytics. If a data element from the enterprise data warehouse wasn’t included, you can go back to the transaction history and quickly pull up and insert the missing data. Gone are the days when that information is lost forever.

 

This concept of quickly created, reusable aggregate tables on top of freely accessible transaction data is known as a “data lake” (as opposed to a “data warehouse,” where information is kept in strictly managed boxes with little access to the underlying transaction-level data that created it).

 

  • Predictive analytics models. Making predictions requires you to slice transactional and enriched information. For example, you can create a budget model with 10 years of data, not just six months of data, used in conventional business intelligence processes.

 

So you can use Big Data for faster, more powerful, and more flexible analytics. Is there a price to pay for these benefits?

 

A WORTHWHILE INVESTMENT

 

Worried about Big Data capabilities driving up your operating costs and making it necessary for your data specialists to learn new skills? If so, you can stop worrying. There’s no need to invest in new resources.

 

The development, data analysis, and exploration tasks are accomplished with the same toolkit that your data analysts already use: SQL, Python, and Excel. Any data analyst you meet likely knows how to do all these things and how to use these tools:

 

  • HIVE, Drill, and Phoenix. These tools use the Big Data equivalent of SQL queries. HIVE is a stand-alone tool to create stand-alone tables or a SQL data model on top of Big Data sources. HIVE is also very similar to SQL in that it must be administrated properly to handle data quickly and efficiently. Many people have the misperception that HIVE is slow. Not so. Slow speeds usually occur when HIVE isn’t partitioned or indexed correctly. Such knowledge is standard for a SQL administrator, Big Data or otherwise.

 

Drill is a Big Data variant of SQL. You can use it on any data source. Drill enables you to use SQL syntax on JSON, CSV, and HIVE tables and (if you want) on HBase tables, too.

 

In a similar vein, Phoenix provides a robust way of creating virtual SQL tables. But there’s one limitation: You can use it only with HBase.

 

  • Python. Python is a rich set of libraries such as PySpark, MrJob, and Snakebite that you can use in Big Data applications. Most analysts are familiar enough with Python to code in it at some level. If they can work in other programming languages to create macros in Excel, for example, they can learn Python in a couple of weeks.

 

Once they know the basics, your analysts can download Anaconda (an analytics suite for Python with Big Data tools) and start using PySpark to crunch distributed data sets.

 

  • Business intelligence tools. Products like Spotfire, Tableau, PowerBI, and SiSense provide ways to interact with Big Data environments. If you can use a pivot table, you can use a business intelligence tool. Plus, odds are that a good number of organizations already have licenses for these tools in-house.

 

There’s nothing that makes these languages and tools special. You use them to write the same types of solutions that you would use in any other data environment. As a management accountant, you would engage the same BI specialists who create your SQL Server Reporting Studio reports or do extract, transform, and load (ETL) tasks. Best of all, you won’t have to hire additional specialists.

 

REQUIRED RESOURCES

 

Let’s face it. Many organizations don’t have data or analytics specialists. A lot of us working right now in firms manage data—the little that is captured—in spreadsheets.

 

The executives and owners of such a business may see the opportunity in Big Data; they often mistakenly believe, though, that it’s too expensive. Therefore, they start looking at data warehouses and cloud databases to solve their problems instead.

 

In reality, setting up a data warehouse is typically just as expensive as a Big Data environment, with the additional costs coming from hardware and maintenance.

 

Fortunately, many companies now offer managed analytics as a service, which provides highly secure shared or private Big Data and analytics environments for smaller firms seeking benefits of Big Data without the associated costs.

 

BIG RETURN ON INVESTMENT

Big Data allows the storage and analysis of all enterprise data. This data can be ingested into a Big Data environment and explored using current industry-standard tools such as SQL and Python.

 

Even if your organization can’t support its own in-house Big Data environment, cloud/managed solutions exist. The benefits of Big Data analysis in support of decision making and predictive or prescriptive data-driven strategies are within the reach of even smaller entities. It’s time to consider Big Data for your organization’s data solution. This article hasn’t even touched on all the added benefits such as streaming data and machine learning—but they will be covered soon.

 

Daniel Smith, CMA, is the director of data science and innovation at Syntelli Solutions and is a member of IMA’s Dallas Fort Worth Area Chapter. You can reach him at daniel.smith@syntelli.com.
1 + Show Comments

1 comment.
    Min Liu May 24, 2017 AT 7:30 pm

    Hi Daniel, I read your article with great interest. I wonder how have you transferred your finance career to data science career. I plan to do the same because of my passion on analytics. I am part of DFW chapter too.

You may also like