Load Data Transactionally
Importing Initial Data
Making Use of Geequel
The ability to load data into ONgDB with Geequel is enabled through a variety of data loading APIs and tools. For processes where big data sets flow in or out of the Open Native Graph Database (ONgDB), consideration needs to be taken to batch these read and write operations into batch sizes that are sympathetic to the master instances memory capacity as well the transactional overhead of data writes.
ONgDB provides a number of APIs to import big data sets including:
- the Geequel transactional endpoint, which uses the Geequel query language and is simple to utilize from any programing language because files containing Geequel can be structured to bulk load data and write consistently.
- the Geequel data import capabilities exposed through LOAD CSV enable CSV files from a specified remote or local URL to be loaded and batching into desirable transaction sizes for importing massive data efficiently.
- the batch inserter which removes transactional overhead, but does require the database to be offline
To load or update data in ONgDB with an efficient write throughput, a reasonable transaction size needs to be consistently maintained based on the complexity of the writes being performed. Smaller-than-usual transactions (consisting of one or a few updated elements) suffer from transaction-committed overhead. Larger-than-expected transactions (involving hundreds of thousands or millions of elements) can lead to higher memory for the transient transaction state. Therefore, an adequate transaction as we’ve seen should consist of anywhere from 1k and 10k elements.
When it comes to large initial imports consisting of million or billions of nodes, having a transactional process doesn’t lead to maximum write performance. To saturate a complete write speed, it’s important to bypass transaction semantics and create your initial data store in a “raw” manner via a batch-insertion mechanism. This of course isn’t an option once the database is online so utilizing an optimized write pipeline to maximize online write throughput into the Open Native Graph Database (ONgDB) becomes essential, which is why we created this as part of the GraphGrid Data Platform.
If you’re starting off with a ONgDB, you’ll need to import data to the graph database or create some initial data to establish your graph data model. For a demo or concept model, it is often be sufficient enough to craft a small graph via the ONgDB Web-UI or the ONgDB-Console. From there, you can either build up on your graph via data-browser or data Geequel “CREATE” and “MERGE” statements. Geequel also enables you to use LOAD CSV to import your initial test dataset, which we’ve experienced as a very effective data loading tool.
Geequel, ONgDB’s graph query language, works well in updating the graph (similar to SQL insert statements for a relational database). The easiest approach to import data in or out of the native graph database with Geequel for your initial testing purposes is to create proper statements for your input data. That can be accomplished through a Spreadsheet or any program language combining a series of connected texts. This is then made to paste or pipe in commands to the ONgDB-Shell, or have it read from a file.
Once you’ve done a simple data load to establish your data model you’ll want to begin considering the requirements of keeping that data updated and consistently flowing into your online Open Native graph database. Using GraphGrid Connected Data Platform fully-featured freemium download can be a great way to quickly load data into ONgDB with Geequel using the connected data tooling available out of the box.