Analytics can be done in cluster batches
After a month of beta Google have made global replication available to all Cloud Bigtable users, allowing data replication across a maximum of four clusters globally.
With global replication in Cloud Bigtable users can now copy data in four cluster across different global regions. Replicated Cloud Bigtable instances ensures that stored data has a higher resilience level in the unfortunate event of a region going offline.
Certain sectors such as finance and healthcare would be greatly impacted by regional or zonal failures and due to regulations in those industries companies affected could face regulatory penalties of some form.
Cloud Bigtable, run on Google’s Cloud Platform, is a wide column NoSQL databases that aims to operate at a consistent latency of 10ms or lower. At its core Bigtable is a distributed storage system for managing structured data. Google uses Bigtable to manage its own services such as Google Earth, Finance and Google Search.
The system is designed for large scale workloads allowing users to scale their database from gigabytes to petabytes without experiencing a significant drop in performance.
In Bigtable, with the now generally available replication function, replicated data can be made durable by repeating it across multiple clusters worldwide. Each cluster has to be located in a unique location such as a different region. This does mean that services will have reduce latency in that region if a company wishes to expand to a new market.
Sandy Ghai Product Manager at GCP wrote in a blog that: “Multi-region replication allows our AdTech users to locate their data close to their customers and to ad exchanges. This makes it easier to reduce end-to-end request latencies for ad bidding and personalization services, where custom advertisements and page content is served to website visitors in real time.”
Cloud Bigtable Replication
A key advantage with replication is the ability to run a batch analytics task on data stored in a single cluster. While running analytics and performing numerous large reads on the data, plus a mix of read and writes, any applications accessing that data will experience a significant slowdown. Using replication one cluster can be chosen to run these analytics, while the rest are tasked with maintaining an uninterrupted service to clients.
In a Google whitepaper [PDF] detailing Bigtable its researcher note that: “Bigtable does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format, and allows clients to reason about the locality properties of the data represented in the underlying storage.”
“Data is indexed using row and column names that can be arbitrary strings. Bigtable also treats data as uninterpreted strings, although clients often serialize various forms of structured and semi-structured data into these strings. Clients can control the locality of their data through careful choices in their schemas.”
The default setting for replication on Cloud Bigtable is ‘eventually consistent,’ which means that when a change is writing into a single cluster that change will eventually be replicated in all clusters at some point. Google state that: “If your instance is healthy, the delay for replication is typically a few seconds or minutes, not hours.”
Spotify is one of Googles larger clients using Bigtable, its chief architect Niklas Gustavsson commented that: “Spotify is a global business with a global user base. Being able to provide a great audio experience to our users is a key priority for us. With Cloud Bigtable clusters in Asia, Europe, and the United States, we’re able to get low-latency data access all over the world, enabling Spotify to provide a seamless experience for our users.”