Changing data capture: The critical link for Airbnb, Netflix and Uber

Were you unable to attend Transform 2022? Check out all the summits in our on-demand library now! Look here.


The modern data stack (MDS) is fundamental to digital disruption. Think Netflix. The company pioneered a new business model around video as a service, but much of their success is built on real-time streaming data.

They use analytics to send highly relevant recommendations to viewers. They monitor real-time data to maintain constant visibility into network performance. They synchronize their database of movies and series with Elasticsearch to enable users to quickly and easily find what they are looking for.

This has to be in real time and it has to be 100% accurate. Old-school extract, transform, load (ETL) is simply too slow. To address this need, Netflix built a change data capture (CDC) tool called DBLog that captures changes in MySQL, PostgreSQL, and other data sources, then streams those changes to target data stores for search and analysis.

Netflix required high availability and real-time synchronization. They also needed to minimize the impact on operational databases. CDC keys are removed from database logs, and replicate changes to target databases in the order they occur, so it captures changes as they happen, without locking records or otherwise crashing the source database.

Event

MetaBeat 2022

MetaBeat will bring together thought leaders to provide guidance on how metaverse technology will transform the way all industries communicate and do business on October 4th in San Francisco, CA.

Register here

Data is central to what Netflix does, but they are not alone in that regard. Companies like Uber, Amazon, Airbnb and Meta are thriving because they truly understand how to make data work to their advantage. Data processing and data analysis are strategic pillars of these organizations, and CDC technology plays a central role in their ability to carry out their core missions.

The same can be said for just about any company operating at the top of its game in today’s business environment. If you want your company to function as an A-player, you need to modernize and master your data. Your competitors are definitely already doing it.

Sub-second integration is the new standard at Airbnb and Uber

In today’s world, a strong customer experience requires real-time data flows. Airbnb recognized the value of CDC technology in creating a great CX for its customers and hosts. They also built their own CDC platform, which they call SpinalTap. Airbnb’s dynamic pricing, listing availability and reservation status require flawless accuracy and consistency across all systems. When an Airbnb customer books a stay, they expect the workflow to be very fast and 100% accurate.

For Uber, immediacy is arguably even more important. Whether a customer is waiting for a ride to the airport or ordering food delivery, timing is crucial. Just like Netflix and Airbnb, they developed their own CDC platform to synchronize data across multiple data stores in real time. Again, a common set of demands emerged. Uber needed their solution to be extremely fast and fault tolerant, with no loss of data. They also needed a solution that would not drag down the performance of their source databases.

Change data capture for the rest of us

Once again, the CDC fits the bill. In the old days, overnight batch-mode ETL might have been sufficient to provide a general manager update or operational reports. Today, real time is increasingly the norm. If information is power, instant access to information is turbo power.

That’s why CDC is quickly becoming a fundamental requirement for the modern data stack. It’s all well and good that big companies like Netflix, Airbnb and Uber have the resources to build custom CDC platforms – but what about everyone else?

Off-the-shelf CDC solutions fill this gap, delivering the same low-latency, high-quality streaming pipelines without having to build from scratch.

Unfortunately, they are not all created equal. Most companies operate a collection of systems that handle enterprise resource planning (ERP), customer relationship management (CRM) or specialized operational functions such as procurement or HR. These run on different database platforms, with incongruent data models. If a company runs mainframe systems, they’re likely dealing with arcane data structures that don’t fit easily with modern relational data.

This makes heterogeneous integration particularly important. It requires connectivity to multiple data sources and targets, including transactional databases such as SAP, Oracle, IBM Db2, and Salesforce. That means delivering real-time streaming data to platforms like Databricks, Kafka, Snowflake, Amazon DocumentDB and Azure Synapse Analytics.

Real-time CDC automation

To drive artificial intelligence (AI) and advanced analytics, companies must move their data to a common MDS platform. That means ingesting information from a variety of sources, transforming it to fit a unified model for analysis, and delivering it to a modern cloud-based data platform.

Data capture transformation technology serves as a critical link in the data-driven value chain—first by automating data ingestion from source systems, then transforming it on the fly and delivering it to a cloud computing platform. Real-time CDC automation ensures that the right information gets to the right place immediately.

Because they focus only on data that has changed, streaming CDC pipelines offer huge efficiency advantages over previous batch-mode operations. The best CDC solutions can deliver 100-plus terabytes of data from source to destination in less than 30 minutes, with no data loss.

The shift to cloud computing is well under way. Cloud analytics in particular offers clear benefits to companies that truly understand the transformational role of data. Leading companies in all industries are adjusting their strategic visions around data analysis. They digitize their interaction with customers and use algorithms to study data, extract insights and take action. AI and machine learning ingest huge amounts of information, discover connections and identify anomalies.

Whether you’re leading the way in digital disruption or just trying to keep up with the pack, CDC technology will play a central role in making the modern data stack a reality and opening the door to digital transformation.

Gary Hagmueller is CEO of Arcion.

Data Decision Makers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people involved in data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You may even consider contributing an article of your own!

Read more from DataDecisionMakers

Leave a Reply

Your email address will not be published.