ONgDB Data Pipeline

This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.

ONgDB Data Pipeline

Data Import, Graph Databases, ONgDB

February 01, 2016

Ben Nussbaum

GraphGrid Data Pipeline

Concurrent Write Operation Management

Continuous Write Flow

Data Throttling

Error Handling

GraphGrid Data Pipeline with ONgDB Every enterprise has a constant flow of new data that needs to be processed and stored, which can be done effectively using a data pipeline. Upon introducing an ONgDB data pipeline into an enterprise data architecture it becomes necessary to efficiently transform and load data into the Open Native Graph Database (ONgDB). Operating an ONgDB data pipeline efficiently at scale with the enterprise integration patterns involved requires an intimate understanding of ONgDB write operations along with routing and queuing frameworks such as Apache Camel and ActiveMQ. Managing this requirement with its complexity proves to be a common challenge from enterprise to enterprise.

One of the common needs we’ve observed over the years is that an enterprise that wants to move forward efficiently with a Open Native Graph Database (ONgDB) needs to be able to rapidly create a reliable and robust data pipeline that can aggregate, manage and write their ever increasing volumes of data. The primary reason for this is to make it possible to write data in a consistent and reliable manner at a know flow rate. Solving this once and providing a robust solution for all is the driving force behind the creation of GraphGrid Data Pipeline.

The GraphGrid Connected Data Platform, offers a robust data pipeline that manages high write throughput to ONgDB from varying input sources. The data pipeline is capable of batch operations management, keeps highly connected writes, manages data throttling, and carries out error handling processes.

GraphGrid’s data pipeline handles concurrent write operations for any incoming data via strategies involving preservation of transactional integrity and transaction batch sizing and data throttling. A majority of writes to ONgDB work well for concurrent write operations, but in scenarios where dense nodes are involved sequential strategies can be utilized to avoid excessive write retry processes. The data pipeline also handles numerous concurrent processes writing data into the Open Native Graph Database (ONgDB) in parallel.

GraphGrid’s data pipeline can consistently manage uninterrupted write flow of connected data via robust logic to handle deadlock instances with a sequential process and an automated retry. Its auto-detection capabilities can lead to further transaction throughput since rollbacks are kept to a minimum, which would otherwise happen during highly concurrent write scenarios.

GraphGrid’s data pipeline also handles the throttling of data to ONgDB that allows for reasonable entry to ONgDB resources through outside applications needing write access. This is useful when the data pipeline is flowing data into a production database used by customer facing applications where resources need to be preserved for real-time application response. The throttled data pipeline may be tuned depending on the load the ONgDB can allow for safely conducting write operations across all systems and applications using it. Keeping the pipeline open will lead to fast flow of updates as quickly as ONgDB can write them. Narrowing the data pipeline from its peak leads to a lower write load on the system.

An automated resolution capability respects transactional integrity when specified but has more flexibility in retry strategy by transaction decomposition methods, which quarantines separate statements in a transaction while successfully posting other statements. The GraphGrid data pipeline helps keep data flowing by handling errors during transaction updates via its quarantine and automated resolution processes.

We’ve experienced that these components of the GraphGrid data pipeline enable an enterprise to get their data connected and flowing efficiently into their ONgDB graph database very quickly so they can begin realizing the business value of their connected data.