Saving the Solo ‘Data Guru Bottleneck’ with Graph Databases and AI

This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.

May 22, 2023

Ben Nussbaum

An illustration depicting two pictures of a person at a computer with paperwork and illustrations of a clock, a calendar and a broken light bulb, the following picture depicting an AI model in the form of a brain surrounded by a graph of connected nodes
You log onto the Griddle, Inc. Slack on Monday morning and see an announcement from your CTO: “Welcome Andy, Griddle’s first machine learning and data engineer! We’re SO excited to have them onboard.”

As part of Griddle’s engineering team, this sounds like a win-win for everyone. You’ve been asking said CTO for more technical prowess to implement some of the most-requested product features, many of which involve some artificial intelligence (AI) technology. All signs point to these floodgates opening wide with Andy’s arrival—time to get back to your favorite part of development work: building amazing new things with the latest and greatest tech.

You add a 👋 to the message and eagerly await Andy’s first day.

The scenario: Andy’s first few months at Griddle, Inc.

Over the next few weeks, you help Andy get their feet wet by giving them access to your organization’s existing big data repository, which contains everything from customer details to product usage as reported by logs from various microservices you helped develop. They even jump into the code to add an open source machine learning (ML) algorithm as a dependency to address one of those in-demand features in the product.

Everyone loves Andy’s presentations at Griddle’s all-hands meetings because they’re solving problems and coming up with new solutions with data in ways that you and your peers only dreamed about before. They’re inspiring your organization’s people, from engineering to marketing and beyond, to develop new use cases for the existing big data repository.

All ideas Andy could implement.

While designing new ways to leverage data for competitive advantage is a good and optimistic mission at heart, the reality is that your organization is already headed down a dangerous road of overburdening Andy with data-related requests.

The problem: Andy becomes to solo ‘data guru’

In the following quarter, you ping Andy on Slack with what feels like a simple request: Examine metrics and logs piped from your Kubernetes cluster to identify opportunities to cut costs with your cloud provider. Maybe you’re setting resource limits too high, creating larger data stores than needed, or you could optimize your data pipeline by archiving to lower-cost storage more often.

Andy says they’re a “little behind” on their requests but will get around to yours ASAP, as they know your CTO has been hammering the team about cost optimizations for the last two sprints, asking them to fork over any spare story points to focus on it.

When you get on a Meet call with Andy and ask them what they’ve been up to lately, you start to see the extent of the problem of having a single data guru.

In Andy’s case, that means navigating a stressful stream of requests, like:

Ongoing data cleansing to remove incorrect, corrupted, or duplicate information before being used in any analysis, which could lead to erroneous decisions and misled investments.

Preparing raw data through collecting, cleaning, and labeling it to ensure all their data-led projects—particularly anything to do with building and deploying custom ML models—begin from the right foot.

Troubleshooting business intelligence (BI) dashboards that aren’t “doing what they’re told” for executives trying to put together the next quarter’s presentation to the board.

Creating an analytics pipeline to identify which cohorts and accounts are ideal for upsells or longer-term contracts through sales and marketing efforts.

Centralizing marketing data, like website analytics and attributions, alongside customer lifetime value (CLV) and customer experience data, so the team can understand what efforts have been most effective and where they should double down in the next fiscal year—then helping them do analysis directly when they get overwhelmed.

Data engineering and operations work, including all the modeling, governance, and maintenance of the Hadoop cluster that stores all your organization’s priceless data.

Every team in your organization is inspired to become data-driven because of Andy’s data and ML influence, but they’ve created a bottleneck. As the only data guru, Andy has no reasonable means of processing requests fast enough, much less catching up and getting back to their actual job role: developing and perfecting new ML models to integrate into the product as a competitive edge.

Andy’s endless list of data-related requests is quickly burning them out, and you watch on from the sidelines, wondering if there’s anything you can do to help. Taking over a request here and there isn’t going to cut it—you need to implement permanent changes that accelerate the speed and simplicity of deploying data and AI solutions to your peers. If you can get that right, you’re not only going to free up Andy’s time, but give your entire organization a meaningfully better platform to become data-driven.

The immediate solution: A different data culture

With an uncertain and challenging labor market, your organization can’t just duplicate Andy and their multipurpose data/ML talents with a simple copy+paste. You need an entirely different approach to data, and it’s one that you, as part of the engineering team, can nurture.

The biggest cultural challenge that impedes data-driven projects is thinking of data as existing only in rigid boxes of columns and rows, like relational databases and spreadsheets. Many of the requests on Andy’s desk, like setting up new analytics pipelines, are just your peers eager to open more boxes for integrating and mashing-up data. Andy wastes time writing more complex queries and formulas to work their “magic” with the data, but the boxes are fundamentally siloed from each other. There are no meaningful ways to solve problems by connecting data across discrete databases or spreadsheets.

Instead, many complex data challenges are better addressed by graph thinking, where your data becomes a richly-connected web of entities, where the type and strength of those connections is of utmost importance.

You can work with Andy (or your organization’s data guru) to instantiate a culture of graph thinking by getting key players into a whiteboarding session. Draw out your major data entities—the people, places, and things most relevant to your organization—and make connections between them. How would you describe the relationships? How can you organize and use data in the most useful ways for solving your organization’s goals?

As your peers in sales, product, marketing and beyond start looking at your data with connection and context in mind, they’ll inevitably start dreaming up new use cases. This is where you, Andy, and the rest of your engineering team need to also prioritize creating knowledge over code.

When operating under the stressful crush of data requests, folks like Andy end up creating “algorithms” that, beneath the hood, are just daisy chains of traditional programming techniques, like booleans and regular expressions. Code-based approaches only work on certain types of data, end up becoming unmaintainable, and require tons of trial-and-error to get valuable output.

Relying on these solutions also reinforces an outdated idea that an organization’s competitive advantage is in its code, not its people. Executives and leadership forget that solving highly complex problems requires experimentation, tinkering, and an imaginative spark. It’s like thinking that Andy’s time-consuming data operations work is more important than their innovative approaches to developing new ML models.

New cultures of data and technology create more meaningful use cases for data, but they don’t solve the underlying problem, which is that Andy simply can’t work fast enough to deploy the solutions your peers need.

The long-term solution: A composable graph + AI platform

You can’t duplicate Andy or hire more folks with their skills. You can deploy new technology that helps the two of you collaborate more meaningfully and amplifies your skills and capabilities with graph data and the latest in artificial intelligence.

A composable graph + AI platform like GraphGrid comes with all the services and pipelines you need to accelerate graph solution development, whether you’re supporting Andy’s endless list of requests or working directly with your non-technical peers. You not only remove the bottleneck around Andy, but open your entire organization up new data-driven knowledge that becomes your long-term competitive advantage by:

- Eliminating tedious data operations work, like cleansing, preparation, and cluster maintenance, with GraphOps tools that ensure security and scalability.

Amplifying how quickly and effectively your peers develop knowledge through access to new technology, like natural language processing (NLP) to extract meaning from previously-unexplored unstructured data, change data capture (CDC) to create “push notifications” for changed data, and more.

Delivering self-service solutions to less technical peers like product managers or marketers, letting them create queries, develop reports, and perform novel analysis on graph data without a single IT request.

Converting Andy’s stressful TODO list into new opportunities, like optimizing their ML models using context-rich graph data or training custom deep learning models to solve problems using data at a far greater scale than one person (or a team) can reasonably handle.

You can start working toward this long-term solution with a simple first step: Schedule a free solutioning session with the graph + AI solution experts at GraphGrid. We’ll help you explore the challenges and bottlenecks in your current data pipeline, explore opportunities for giving people superpowers with AI solutions, and get you started down a better path.

Andy might still retain the data of Griddle, Inc.’s “data guru,” but you get to be the person who gives everyone at your organization superpowers with new solutions built on graph data and AI.