在不同物联网流量条件下评估 DDS、MQTT 和 ZeroMQ

原文链接：Evaluating DDS, MQTT, and ZeroMQ Under Different IoT Traffic Conditions

摘要

Publish/Subscribe (pub/sub) semantics are critical for IoT applications due to their loosely coupled nature. Although OMG DDS, MQTT, and ZeroMQ are mature pub/sub solutions used for IoT, prior studies show that their performance varies significantly under different load conditions and QoS configurations, which makes middleware selection and configuration decisions hard. Moreover, the load conditions and role of QoS settings in prior comparison studies are not comprehensive and welldocumented. To address these limitations, we (1) propose a set of performance-related properties for pub/sub middleware and investigate their support in DDS, MQTT, and ZeroMQ; (2) perform systematic experiments under three representative, lab-based real-world IoT use cases; and (3) improve DDS performance by applying three of our proposed QoS properties. Empirical results show that DDS has the most thorough QoS support, and more reliable performance in most scenarios. In addition, its Multicast, TurboMode, and AutoThrottle QoS policies can effectively improve DDS performance in terms of throughput and latency.

发布/订阅 (pub/sub) 语义由于其松散耦合的性质而对 IoT 应用程序至关重要。尽管 OMG DDS、MQTT 和 ZeroMQ 是用于物联网的成熟 pub/sub 解决方案，但先前的研究表明，它们的性能在不同的负载条件和 QoS 配置下存在显着差异，这使得中间件选择和配置决策变得困难。此外，在先前的比较研究中，负载条件和 QoS 设置的作用并不全面和有据可查。为了解决这些限制，我们 (1) 为 pub/sub 中间件提出了一组与性能相关的属性，并研究它们在 DDS、MQTT 和 ZeroMQ 中的支持； (2) 在三个具有代表性的、基于实验室的真实物联网用例下进行系统实验； (3) 通过应用我们提出的三个 QoS 属性来提高 DDS 性能。实证结果表明，DDS 的 QoS 支持最彻底，在大多数场景下性能更可靠。此外，其Multicast、TurboMode和AutoThrottle QoS策略可以有效提升DDS在吞吐量和延迟方面的性能。

**关键词：**Publish/Subscribe Middleware, Benchmarking, MQTT, DDS, ZeroMQ, Performance Evaluation

1. 简介

Distributed deployment of real-time applications and high-speed dissemination of massive data have been hallmarks of the Internet of Things (IoT) platforms. IoT applications typically adopt publish/subscribe (pub/- sub) middleware for asynchronous and cross-platform communication. OMG Data Distribution Service (DDS), ZeroMQ, and MQTT are three representative pub/sub technologies that have entirely different architectures (decentralized data-centric, decentralized message-centric, and centralized message-centric, respectively). All of them implement the pub/sub messaging pattern and provide a set of configurable parameters for customizing middleware behaviors and resource allocation. Accordingly, an essential question that needs to be answered is how to choose an appropriate middleware given a workload condition, and which parameters should be regarded for making the optimal configuration decisions. To that end, we propose a set of QoS properties that are tied to the performance of IoT pub/sub applications and investigate which of those properties are supported by DDS, MQTT, and ZeroMQ. We then conduct a systematic set of experiments to assess their performance in three pub/sub use cases (high-frequency, periodic, and sporadic), which provides us with baselines for doing further performance optimization. Empirical results show that DDS has the most stable performance in the above scenarios and provides the most in-line support to the QoS proprieties we propose. To understand the performance under high frequency data flows, we further experimentally explore the impact of three competitive QoS policies (Multicast, TurboMode, and AutoThrottle) on improving DDS application performance under such conditions.

实时应用程序的分布式部署和海量数据的高速传播一直是物联网 (IoT) 平台的标志。 IoT 应用程序通常采用发布/订阅（pub/-sub）中间件进行异步和跨平台通信。 OMG 数据分发服务 (DDS)、ZeroMQ 和 MQTT 是三种具有完全不同架构的具有代表性的发布/订阅技术（分别为以分散的数据为中心、以分散的消息为中心和以消息为中心）。它们都实现了 pub/sub 消息传递模式，并提供了一组可配置的参数来定制中间件行为和资源分配。因此，需要回答的一个基本问题是如何在给定工作负载条件下选择合适的中间件，以及在做出最佳配置决策时应考虑哪些参数。为此，我们提出了一组与物联网发布/订阅应用程序的性能相关的 QoS 属性，并研究了 DDS、MQTT 和 ZeroMQ 支持哪些属性。然后，我们进行了一组系统的实验来评估它们在三个发布/订阅用例（高频、周期性和零星）中的性能，这为我们提供了进一步优化性能的基线。经验结果表明，DDS 在上述场景中具有最稳定的性能，并为我们提出的 QoS 特性提供了最在线的支持。为了了解高频数据流下的性能，我们进一步通过实验探索了三种竞争性 QoS 策略（Multicast、TurboMode 和 AutoThrottle）在这种情况下对提高 DDS 应用程序性能的影响。

2 QoS Properties Essentials for IoT Systems

The primary purpose of this section is to introduce several performance-related QoS policies available in DDS, MQTT, and ZeroMQ that are necessary for performancesensitive IoT applications. As a branch of networking applications, the performance (throughput and latency) of pub/sub middleware mainly depends on the network congestion status and internal resource configuration. In this paper, we look at the network aspect, and the principle we followed to choose these features is whether it can effectively change the amount of data entering and leaving the network pipeline per unit time or the processing logic of messages in the network.

2.1 Reliability

Reliability policy determines whether messages can be reliably transmitted, which fatally affect the data-processing and decision-making processes of IoT applications. DDS provides two types of transmission guarantees: Best Effort (i.e., no guarantees) and Reliable (i.e., retry until success). MQTT provides three service levels: QoS 0, QoS 1, QoS 2, representing "at-most-once", "at-leastonce", and "exactly-once", respectively. MQTT’s QoS 0 is equivalent to DDS’ BestEffort and QoS 2 to DDS’ Reliable delivery. ZMQ pub/sub pattern may lose messages when the network is congested.

2.2 Multicast

Multicast allows IoT applications to scale better when multiple subscribers exist. Both DDS and ZeroMQ allow users to set the number of hops that multicasting messages can traverse, which intends to avoid flooding large networks with multicast traffic. Although MQTT runs over unicast protocol, it emulates the application-layer "multicast" by wrapping up point-to-point TCP connections, which is also known as multicast-over-unicast. The protocol utilized in ZeroMQ for multicast include Pragmatic General Multicast (PGM) [5] and Encapsulated Pragmatic General Multicast (EPGM).

2.3 Intelligent Batching

Message batching intends to improve the throughput of IoT applications. It avoids frequent system calls through the network stack due to message by message processing and can be performed at a fixed interval or over number of messages. The TurboMode QoS in DDS automatically decides the optimal number of bytes in a batch on publisher side based on the data sample size, writing speed and the real-time system state. Message batching in ZeroMQ is can be enabled at either publisher or subscriber side.Rather than deciding timeout or batch size for a single chunk, ZeroMQ always forwards all messages queued in memory at the moment to the network interface card in one go. However, due to the lack of strict reliability guarantee in ZeroMQ’s pub/sub pattern, messages sent in a large batch risk being lost when the network is crowded. At the subscriber side, if subscriber can keep up with publisher and there is no application-level queue backlog, ZeroMQ would turn off its batching function for the sake of lower end-to-end delay. Otherwise, the batching would be enabled to help the worker thread promote the speed of flushing backlogged messages. MQTT does not support intelligent batching.

2.4 Rate Limit

Rate Limit is a flow control mechanism that protects network resources from being encroached by malicious actors in an IoT cluster by specifying the maximum rate at which a publisher may send samples to the network. DDS allows users to define the Custom Flow Controller that maintains a separate FIFO queue for each connection in which data instances generated by asynchronous publishers can be placed. ZeroMQ supports rate limit only when the multicast protocol (PGM/EPGM) is enabled, and is implemented by setting the multicast window size. MQTT server shapes the egress traffic of each TCP/SSL connection based on the Leaky Bucket (LB) algorithm.

2.5 Message LifeSpan

Message expiration mechanism restrains the longest time that a message is regarded as a legal one in the system, which improves overall memory utilization rate and avoids delivering stale data. In the reliable transport mode of DDS, if the LifeSpan QoS policy is not configured, unconfirmed data instances will reside in the publisher’s writing buffer for a long time and be re-transmitted endlessly, which not only wastes memory and bandwidth, but also destroys the timeliness of messages. Adhering to similar intent, MQTT implementations(etc. EMQ X, HiveMQ, and VerneMQ) that adopt the latest MQTT specification (MQTT 5.0) allows publishers to specify a property for each sample called message-expiry-interval that defines the actual lifespan of messages in seconds. There is no analogies with LifeSpan QoS in current ZeroMQ implementations.

2.6 Topic Priority

Topic priority aims to distinguish the importance of different messages, which ensures that the overall utility of a multi-topic IoT system is maximized under given resource limitation. DDS enables this property via the Tansport_Priority policy that is signified by a 32-bit signed integer. An analogous parameter in ZeroMQ is called ZMQ_TOS that is implemented in the same logic as DDS. Unlike the above system-level implementations, the topic priority in MQTT applications is designated as a configurable attribute of message queue in the broker, with values from 0 to 255.

3 Understanding The Performance Baseline

The purpose of this section is to understand the baseline performance of DDS, MQTT, and ZeroMQ under three representative IoT workload conditions (high-frequency, periodic, sporadic). We performed our tests on an ARM cluster comprising 10 Raspberry Pi 3 Model B boards that have 1.20 GHz CPU speed, four physical cores, and 1 GB memory. Software details of the cluster are as follows: Raspbian 9 OS, Linux 4.14.91 kernel, and GCC 4.7.3. The cluster bandwidth of 95 Mbps was inferred using Linux iperf3. Interference was minimized by pinning publishers and subscribers to separate cores. To avoid latency measurement error introduced by system clock jitters, we synchronized the system clock on each node using the Precision Time Protocol (PTP) [4]. Our test results indicate that PTP can guarantee the clock offset within 200 microseconds between Raspberry Pi boards even if the CPU is 80 percent utilized.

We leveraged RTI-Perftest [7] to monitor and benchmark throughput, latency and CPU utilization of the testing DDS application. RTI-Perftest is a highly configurable command-line benchmarking tool developed by RTI for evaluating applications performance that use RTI Connext DDS Professional 6.01 as middleware. RTIPerftest measures throughput by counting the amount of bytes received by subscriber per second. To avoid measurement errors caused by system clock jitter, RTIPerftest calculates the one-way delay by sending latency test data samples in a ping-pong manner. For MQTT and ZeroMQ, we developed custom testing tools using open source APIs. All tests were repeated five times and each run lasted 90 seconds. We measured performance metrics every 5 seconds.

3.1 High-frequency Data-flow Tests

This test was motivated by a fault diagnosis system where vibrations and sound information gathered by acoustic sensors are continuously propagated to ADC or cloud servers for further ML-based analysis. Hence, in this test, we configured a publisher floods a continuous data-flow to a single/multiple subscribers with unlimited rate. were profiled with default QoS settings. Figure 1 plots mean throughput versus payload size.

The 1-1 subplot indicates that application throughput grows as the payload size increases. When the message is smaller than 1 KB, ZeroMQ performs best, whose throughput is 1715.43%-3069.23% higher than that of MQTT and 18.8%-110.66% greater than DDS. The reason is there is no bandwidth or CPU pressure for either ZeroMQ or DDS at this time, and since ZeroMQ is built on top of the socket layer (lower than DDS), it spends less time than DDS in executing application-level data processing. However, when the message is larger than 1KB, ZeroMQ throughput is not smoothly converged, and its throughput becomes lower than DDS. The shape of the DDS curve is as expected: its throughput increases first then converges to the maximum bandwidth (95mbps). Compared with DDS and ZeroMQ, the throughput of MQTT is poor due to its broker-centric architecture. In the 1-7 test, the throughput of DDS and MQTT is one-seventh of the physical bandwidth since messages are forwarded to seven subscribers one by one. On the other hand, ZeroMQ gains the highest throughput, which reveals that the overhead of ZeroMQ packets in one-tomany cases is significantly lower than DDS and MQTT.

3.2 Periodic Data-flow Tests

The Periodic use case maps to the continuous data-flow in real-life, where the size and frequency of data change less frequently, e.g., a wind farm monitoring system in which windmills that rotate at a constant speed send telemetry data at a fixed rate through which vendors can continuously track the operational status of windmills. In this kind of scenario, latency attracts more interests than throughput as the timeliness of monitoring data is more important. In the following test, we simulated this use case and configured a publisher to send messages to a subscriber at a specified rate. The publishing rate varies from 200 to 1000 samples per second. Figure 2 shows the results of the 90th percentile latency versus publishing rate for small, medium, and large messages.

For small (64B) and medium (2KB) messages, MQTT latency slightly improves first then remains flat, the reason being that MQTT broker needs more time to process ingress/egress traffic as the sending rate increases. However, when the broker can not keep up with the publisher and the writing buffer on the publisher node is exhausted, the writing process of publisher will be blocked, thereby keeping the actual message dissemination rate and latency unchanged. In addition, the change of latency is not very obvious for DDS and ZeroMQ because (1) their participants connect in an end-to-end manner, (2) subscriber can keep up with the publisher since bandwidth is not stressed.

For large samples (32KB), MQTT latency remains flat as the publishing rate varies, and is consistently at the lowest level compared to others when the dissemination rate is faster than 400 samples per second for similar reasons as before. DDS and ZeroMQ share the same trend with DDS latency a bit larger than that of ZeroMQ when the publishing rate is 200 samples/sec. ZeroMQ and DDS latency suddenly increase as the message diffusion rate pass 400 and 600 samples per second, respectively, and then fluctuates marginally. We believe the reasons are twofold: (1) the bandwidth is exhausted when the publishing rate surpasses 400 samples per second (400 samples/s * 32KB > 95Mbps); (2) the publishing process is blocked as the local buffer is exhausted, which is the same reason as in MQTT.

3.3 Sporadic Data-flow Tests

Multiple independent pub/sub applications may co-locate in the same LAN environment. Consider the same wind farm monitoring system example, if anomalous status is identified on some windmills (i.e., stops spinning due to slow wind speeds or mechanical stoppage ), it is necessary to send a large volume of failure detection data to the central control system. As a result, co-located applications may experience latency deterioration due to bandwidth contention caused by the bursty traffic.

To that end, we generated a periodic data-flow (PDF) and a sporadic data-flow (SDF) using two one-to-one pub/sub applications with separate topics. The packets size of SDF application was set to 2MB and the message frequency of PDF set to 25Mbps of the physical bandwidth. Also, the SDF application began 30 seconds later than the PDF application and executed for 10 seconds.

Figure 3 depicts the latency of the application under the interference of the SDF application. MQTT latency has higher standard deviation than DDS and MQTT, which implies MQTT is more sensitive to the bursty data stream due to the presence of the broker. And DDS tolerates the situation bettwe than ZeroMQ.

4 DDS-focused Evaluations

Since DDS provides more modularized and pluggable QoS properties, this section probes it further. Prior works [2][6] reveal that transport protocol, message batching, and flow control are non-trivial aspects of performance tuning in networked applications, we evaluate DDS along these dimensions that corresponds to the Multicast, TurboMode, and AutoThrottle QoS policies in DDS. The setup was similar as in the High-frequency Data-flow Test.

Figures 4 and 5 reveal that the publisher’s CPU usage remains flat in the beginning then gradually decreases as the message size increases, which indicates that messages are produced less often as message size increases since data samples need more time to be delivered. Yet, the writing process is blocked during this period until more packets can be put into the pipe.

The TurboMode improves throughput 23.6%-848.3% and 7.5%-1166.7% for messages that are smaller than 1KB in 1-1 and 1-7 tests, respectively as the network stack is traversed less frequently. Moreover, it does not always lead to higher latency in the 1-n test because the time consumed on sending individual messages to multiple receivers in a unicast manner may be longer than sending multiple messages in batch. Multicast effectively improves application performance in the 1-7 test (272.3%-842.2%) but the throughput cannot converge to the physical bandwidth as payload increases, which we believe is due to the limitation of network switch.

When the AutoThrottle mode is enabled, our results show that the application throughput reduced an average of 18.0% and 18.7% in the 1-1 and 1-7 tests, respectively. Likewise, the latency reduced 3.6% and 8.8%, respectively. Since the AutoThrottle feature needs to keep tracking the system states(send window occupancy and NACK messages amount) to make throttling decisions, it results in higher CPU utilization than the normal configuration.

Performance assessment of pub/sub middleware have been conducted by many prior works under different experimental settings. Pereira et al. [9] propose a set of qualitative and quantitative dimensions for benchmarking IoT middleware. They used a large dataset to simulate a smart city use case, and evaluated the performance of two middleware platforms (FIWARE2 and oneM2M3 ) from their proposed benchmarking dimensions. Similarly, the quantitative analysis in [1] presents performance (throughput and latency) variation between Open MQ, Active MQ, and Mantaray MQ for different message sizes. However, since the middleware they investigated are all broker-based, they learned publishing and subscribing processes separately, rather than performing end-to-end tests. In [10], authors provided an overview of round trip time (RTT) difference of OPC UA, ROS, DDS, and MQTT under different CPU and network load conditions. Dobbelaere et al.[3] established a generic comparison framework based on the core functionality of pub/sub systems. Using this framework, they delved into qualitative and quantitative comparison of two commercially-supported middleware: Kafka and RabbitMQ. To avoid interference from the network layer, they executed their experiments on a single host with empirical application configurations. Luzuriaga et al.[8] present an experimental evaluation of AMQP and MQTT in the context of unstable network conditions. Their assessments are based on a simple one-to-one publish/subscribe scenario. Compared to these works, we designed test cases based on real-life scenarios and provided users with more insightful guidelines on selecting QoS policies to improve middleware performance.

6 Concluding Remarks

This paper empirically evaluates the performance of three pub/sub technologies: OMG DDS, MQTT and ZeroMQ for representative IoT scenarios (high-frequency, periodic, and sporadic). DDS provides more comprehensive and modularized QoS support than others, and also demonstrates better overall latency and throughput in most evaluated scenarios. Specifically, DDS gained higher throughput than ZeroMQ and MQTT in the high-frequency data-flow use case. In periodic data-flow, ZeroMQ has lower latency than DDS for small(64B) and medium(2KB)messages. DDS latency outperforms ZeroMQ when sending large messages(32KB). MQTT is more sensitive to the in-parallel sporadic data-flow, and DDS can successfully shield the interference. Our results also reveal that DDS’s Multicast QoS can effectively improve throughput in multi-subscriber scenarios. The TurboMode property can intelligently decide appropriate batch size with regard to different payload and significantly improve throughput for small messages. And The AutoThrottle property results in lower throughput and latency and higher CPU utilization.

Our future work will include: (1) examining more QoS settings (not only for DDS, but also MQTT and ZeroMQ) experimentally; (2) evaluating more middleware using real-world workload instead of synthetic data-flows; (3) designing intelligent decision-making algorithms to automatically configure and adaptively adjust middleware QoS parameters under various (dynamic) load conditions.

Acknowledgments

This work is supported by a grant from the Siemens Technology. Any opinions, findings, and conclusions or recommendations expressed in this material are of the author(s) and do not necessarily reflect the views of Siemens Technology.

摘要​

1. 简介​

2 QoS Properties Essentials for IoT Systems​

3 Understanding The Performance Baseline​

3.1 High-frequency Data-flow Tests​

3.2 Periodic Data-flow Tests​

3.3 Sporadic Data-flow Tests​

4 DDS-focused Evaluations​

5 Related Work​

6 Concluding Remarks​

Acknowledgments​

摘要