Pub/Sub and Batching: Optimizing Cloud Messaging Performance

Enhancing Pub/Sub Efficiency: The Key Role of Batching in Streamlining Cloud Messaging

Jan 30, 2024

Google’s Pub/Sub is a powerful tool for creating loosely-coupled services in the cloud. It serves as a robust platform for asynchronous communication. However, leveraging cloud technologies like Pub/Sub comes with its own set of challenges, primarily due to the underlying network calls.

Network-related issues such as latency, network partitions, and the risk of lost messages can significantly impact performance. These issues, often masked by the abstraction layers of client libraries, can lead to unpredictable and varied service performance.

Therefore, understanding and mitigating the impacts of network calls is crucial for maintaining optimal system performance. One effective strategy is minimizing the frequency of these calls, especially in high-traffic scenarios.

In this post, we'll explore how batching messages in Pub/Sub can enhance performance, ensuring a more reliable, efficient, and cost-effective message handling process.

No Premature Optimization

It's important not to jump into optimizing something before understanding if there's a real problem.

Let us first assess how Pub/Sub performs without batching messages. I've written a Kotlin script for this purpose, which is available on GitHub. This script sends a number of messages to Pub/Sub, waits for them to be processed, and then records how long it took.

To better simulate a real-world scenario, the script introduces a delay of 0 to 5 milliseconds between each publish, leaning towards less than 2 milliseconds.

Such tests aren't perfect. They're influenced by many factors like the performance of my computer and the quality of the internet connection, so the results can vary. But they do give us a general idea of how things are working and where there might be room for improvement.

Here is the result for no batching:

| Msgs | Med. Time (ms) | Msgs/sec | Time to Publish 1M |
| ---- | -------------- | -------- | ------------------ |
| 1K   | 4,873          | 205      | ~1h 21m            |
| 10K  | 27,773         | 360      | ~46m 17s           |
| 25K  | 67,342         | 371      | ~44m 54s           |

Breakdown of the performance metrics:

Msgs (Messages): This column indicates the number of messages that were published in each test run.
Med(ian) Time (ms): This represents the median time taken to publish all messages in a single run, based on 25 trials. It gives us a reliable measure of the typical performance.
Messages per Second: This shows the rate at which messages were successfully published each second.
Time to Publish One Million: This is an estimated duration to publish one million messages, calculated using the median time. It assumes that the publishing rate remains consistent at the median speed.

I'm basing this on a constant, steady flow of messages. However, if your system faces bursts of traffic, be aware that your actual results may vary.

It appears that there is some 'warming up' happening 🔥. When more messages are sent (10,000 instead of 1,000 per experiment), the messages per second increase significantly. However, the difference in performance between 10,000 and 25,000 messages is not as pronounced, suggesting diminishing returns at higher traffic. This observation aligns with Google Pub/Sub's architecture, which is:

designed to be horizontally scalable, where an increase in the number of topics, subscriptions, or messages can be handled by increasing the number of instances of running servers.

This suggests that the system likely allocates more resources as the message load increases. I took this into account by running each experiment 25 times, so the warm-up phase per experiment should be ruled out.

Having a steady flow of messages in Pub/Sub could ensure more resources, compared to sending sporadic bursts of 1,000 messages, which might result in scaled-down resources.

For more detailed information, you can refer to the Google Cloud Pub/Sub architecture page.

Batching

Set up Batching

In order to send messages in batches, we must provide our Publisher with the appropriate BatchSettings. Here's an example configuration using Google’s Java client library:

val batchingSettings =
    BatchingSettings.newBuilder()
        .setIsEnabled(true)
        .setDelayThreshold(...)
        .setElementCountThreshold(...
        .setRequestByteThreshold(...)
        .build()
val builder =
    Publisher.newBuilder("projects/$GCP_PROJECT/topics/$GCP_TOPIC")
        .setBatchingSettings(batchingSettings)
        .build()

isEnabled activates batching functionality.
elementCountThreshold defines the maximum number of messages that can be included in a single batch.
requestByteThreshold sets the maximum total size of a batch in bytes.
If neither the elementCountThreshold nor the requestByteThreshold is met, the delayThreshold determines the maximum wait time before sending an incomplete batch.

These settings are common across all client libraries, though the exact naming might vary slightly.

A note of caution for those using Spring Boot with PubSubTemplate ⚠️: It's essential to include all configuration properties in your application.yaml file, as there are no default settings. Missing out on any property means the PubSubTemplate will function without batching, which can be misleading since there are no error messages or warnings to indicate this. For your convenience, here's an example setup of the necessary parameters:

spring:
  cloud:
    gcp:
      pubsub:
        enabled: true
        publisher:
          batching:
            enabled: true
            element-count-threshold: 10
            delay-threshold-seconds: 1
            request-byte-threshold: 100000

Does Batching Help?

Now, let's rerun the previously mentioned experiment, but this time with batching enabled. For my experiments, I focus on the scenario with 1000 messages.

I conducted a grid search to optimize the batch parameters in my experiment, testing various combinations to find the best setting. Here are my results:

🥇 Best Performance:
- Settings: delayTreshold 10ms, elementCountThreshold 5, requestByteThreshold 4096
- Time to Publish One Million: Approximately 52 minutes and 24 seconds
- Performance Improvement: About 35.79% faster than the baseline
🐢 Worst Performance:
- Settings: delayThreshold 1000ms, elementCountThreshold 50, requestByteThreshold 2048
- Time to Publish One Million: Approximately 1 day, 2 hours, and 43 minutes
- Performance Decline: About 200.47% slower than the baseline

Better parameters can be found with a more exhaustive grid search. However, for demonstration purposes, this level of search is sufficient.

It's crucial to pay attention to batch parameter settings, as improper configurations can significantly decrease performance, leading to inefficiencies in processing and increased operational times.

The tricky part is balancing these three parameters to achieve an optimal trade-off between latency and network traffic. Setting the delay threshold too low can lead to frequent dispatch of incomplete batches, increasing the number of requests. Conversely, a high delay threshold might result in idle periods, causing unnecessary delays in message sending ⏳.

Managing Message Flow in Batching

To optimize message batching in Pub/Sub, implementing flow control is essential, especially when dealing with high message generation rates. Without effective flow control, you risk memory overload or having messages that are too old to be useful, potentially triggering a DEADLINE_EXCEEDED error. This happens if you produce messages at a faster rate than your publisher can send them.

Flow control can be fine-tuned using three key parameters:

Buffer Size in Bytes (setMaxOutstandingRequestBytes): This parameter defines the maximum buffer size for batching messages, measured in bytes. It helps in managing the memory usage by limiting the total size of messages waiting to be sent.
Buffer Size in Terms of Events (setMaxOutstandingElementCount): This sets a cap on the number of messages that can be included in the buffer. It's a count-based limit, ensuring that the buffer doesn't exceed a specified number of messages.
Behavior When Limits are Exceeded (setLimitExceededBehavior): This crucial parameter dictates the course of action when the set limits are surpassed. The options include:
- Ignoring the limit (effectively disabling flow control),
- Throwing an exception (to signal an immediate issue),
- Blocking (which pauses message addition to the buffer until there's enough space).

These parameters collectively ensure that the batching process is efficient, preventing bottlenecks and maintaining a smooth flow of messages.

Conclusion

Batching Pub/Sub messages has the potential to enhance your application's performance, but it requires careful adjustment of parameters. It's vital to have a clear understanding of your application's performance prior to enabling batching. Make sure there are no other constraints impacting performance before you proceed with adding batching settings 🤔.

Finding the ideal settings for batching in my model scenario was a time-consuming process. It's worth noting that real-world scenarios can be significantly more complex.

Additionally, batching tends to be more effective with a steady stream of events; it may not perform as well in scenarios with sudden spikes in traffic 🚀.

verbosemode

Discussion about this post