Big Data Architecture

Acquisition & Storage

Acquiring data is a process that encompasses capture of business events and persisting them for future processing. Making sure all relevant information is captured in a structured format, and is persisted reliably.

BI Integration

Business Intelligence tools need access to all data to provide the most value. Where only a few years ago all relevant data was in one big relational database, today it often lives across different systems. Many systems provide different ways of accessing the data, from providing full SQL support to custom API and format requirements. Managing the different data sources in a BI tool becomes a challenge.

Real Time Stream Processing

Businesses operate on a 24/7 schedule, and they collect data continuously. So why should you wait to make key decisions until data is loaded? Increasingly real time systems are built for data ingestion, integration, analysis, machine learning, and decision support. These systems support low latencies even when operating at the scale of millions of transactions per second.

Schema Management

Data has structure, and that structure is encoded in a schema. As the structure evolves with changing business requirements, so should the schema. The schema serves not only as documentation to the business analysts for the shape of the data, but also enforces the applications to provide valid records.

ETL Pipelines

For data to flow between systems and be readily available for analysis, ETL pipelines are a well established solution.

Interactive SQL

SQL knowledge is prevalent in the industry, but many tools don’t support it directly. Fortunately there are many systems that enable interactive SQL over a variety of data sources. Open source products like Impala, Hive, Presto, and Drill integrate well with BI tools to perform analytical queries over sources that are otherwise difficult to connect.

Machine Learning

Recommendation Engines

Providing product recommendations is a proven way to improve user experience and increase conversions. Recommendation algorithms are very sophisticated, and are able to sift through massive amounts of interaction data to find underlying correlations.

Predictive Analytics

Using accumulated data and modern algorithms, AI can predict user churn, purchase likelihood, lifetime value, customer behavior, and more. Generated results can be used to drive marketing campaigns, feed into A/B testing segmentation, or provide insightful reporting.


The more information you have about your customers, the better the service you can provide to them. However, you sometimes don’t have as much information as you want. In those cases machine learning can infer some information with a certain probability.


AWS Consulting

Amazon Web Services provides dozens of services, which can be daunting and difficult to navigate when building a new cloud offering or migrating into the cloud. We help you navigate the landscape of AWS products to chose the ones that are right for you, and redesign your applications around them.

Cloud Deployment

Fundamental difference between hosted and cloud services is that the cloud servers tend to be viewed as ephemeral. Deployment becomes critical, as new servers need to be able to be provisioned at any time to be fully ready. This requires a shift in deployment strategy to fully automated build processes.

Monitoring & Security

Both security and monitoring are essential in a public cloud deployment. Traditional on-premises approaches don’t typically translate well into the cloud, and providers offer alternative solutions, which have to be integrated into your applications.


Application Development

Android and iOS are now the world’s dominant operating systems. Your applications should work on them. With dozens of applications, we have the full spectrum of mobile experience to develop your mobile applications.

Mobile Analytics

Its critical to understand how customers are using your applications. Which features are they interacting with? What time of day do they typically use it? Are they using it while connected to WiFi, or on slow cellular connection? How much time are they spending in the application? What is the median load time? These, and many other questions are essential for your business to understand if you want to continue to serve your mobile customers.

AB Testing

Making data driven decisions sometimes requires experiments to know which option is better. We help set up AB testing infrastructure and run individual tests.

Performance optimization

Slow applications lead to abandonment and customer loss. Optimizing performance on mobile applications involves a combination of using data to find bottlenecks, code profiling, and application redesign for better user experience.

Our Partners


The leader in Big Data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale.

Analytics Inside

Companies need to transform their data into powerful knowledge that gives them a competitive edge. We give them the training, consulting and solutions to do that by leveraging our deep experience in big data, machine learning, text analytics and advanced computing.


We help at all stages in the process - whether you have a functioning pipeline that needs a strategic visualization, or you’re getting started and don’t currently capture any data. The exact process we use depends on requirements, but the goal we strive for is:

  • Acquisition

    Starts with capturing business events in a log, persisting them to Kafka or sending them to Flume, and ends with the data being saved to HDFS.

  • Preparation

    Applying structure to the log data by joining on relevant data sets, applying machine learning algorithms, sending it to various edge stores for final consumption.

  • Enablement

    Providing access to realtime systems, analytics databases, and BI tools to extract and summarize the results.

We take Agile Methodologies to heart, and work with you to deliver most valuable functionality as soon as possible. Our commitment to open source technologies ensures the most cost effective and flexible solutions are used.


Sign up to our weekly "This Week In Data" newsletter to stay on top of the latest events in open source projects!

See previous weeks issues

Contact Us