Reducing UDP Latency

栏目: IT技术 · 发布时间: 5年前

内容简介:Hi! I’m one ofControl and responsibility is a key point for a wide range of embedded systems. On the one hand, sensors and detectors must notify some other devices that some event occurred, on the other hand, other systems should react as soon as possible.

Reducing UDP latency

Hi! I’m one of Embox RTOS developers, and in this article I’ll tell you about one of the typical problems in the world of embedded systems and how we were solving it.

Reducing UDP Latency

Stating the problem

Control and responsibility is a key point for a wide range of embedded systems. On the one hand, sensors and detectors must notify some other devices that some event occurred, on the other hand, other systems should react as soon as possible. Examples of such systems include CNC, vehicle control, avionics, distributed sensor systems and lot of others.

At the same time, it’s really hard to develop bare-metal programs for a number of reasons:

  • Developers don’t have much choice for frameworks and languages: it probably will be ANSI C and assembly language even for non-time-critical parts of code which can be developed faster with something else (for example, debugging output, collecting statistics, some user interface for diagnostics and so on)
  • There are really lots of solutions which require different hardware hardware drivers: network, interrupt controller, timer and UART driver are bare minimum
  • Some systems have both FPGA and HPS, which leads to additional steps to “glue” all the parts together

This leads popularity of Linux kernel in embedded systems and it works great in lots of applications, as it gives portable and stable code base.

But let’s see some specific case: time-critical applications that rely on network.

“Time-critical” may mean different things:

  • Applications that require high bandwidth
  • Applications that require low latency

Linux works great with the first case as there are a number of possible optimizations (turn off interrupt coalescing and so on…), but can you achieve better results in terms of low latency? Let’s find out!

Real-life example

We had a following task: minimize possible latency for every single UDP response over the ethernet. DE0-Nano-Soc board was used as an embedded system core which would control some peripheral devices as a reaction for commands in those UDP packets.

Network topology is Point-to-Point, so there are no intermediate hubs, routers and other network devices.

Maximum acceptable latency is 0.1ms while basic Linux solution could only provide 0.5ms.

At the same time it was necessary to support POSIX-compatible programs.

Reducing UDP Latency

To measure estimated response time we will use two hosts.

The first host will be a desktop computer running GNU/Linux operating system, the second host will be DE0-Nano-SoC development board. This board has FPGA and HPS (Harp Processing System, it’s basically ARM), and we’re going to minimize response time for HPS running Embox RTOS.

We will use simple testing application which looks like this:

while (1) {
 char buf[BUFLEN];recvfrom(s, buf, BUFLEN); 
 sendto(s, buf, BUFLEN);
}

This program will run on the second host, i.e. DE0-Nano-SoC.

First host will be sending UDP packets and waiting response for each of them, measuring time for the response.

for (int i = 0; i < N; i++) {
  char buf_tx[BUFLEN], buf_rx[BUFLEN];sprintf(buf_tx, “This is packet %d\n”, i);time_t time_begin = time_now();sendto(s, buf_tx, BUFLEN);
  recvfrom(s, buf_rx, BUFLEN);time_t time_end = time_now();
  if (memcmp(buf_tx, buf_rx, sizeof(buf))) {
    printf(“%d: Buffer mismatch\n”, i);
  }
  if (time_end — time_begin > TIME_LIMIT) {
    printf(“Slow answer #%d: %d\n”, i, time_end — time_begin);
  }
}

Also we measure average, minimal and maximal response time.

Source code is available on GitHub .

With test run we made sure that packets are received successfully, so we have started to make some basic optimizations:

  • Get rid of all debug UART output: it turned out to be the slowest part
  • Compiling with -O2
  • Enabling L2 cache controller PL310 (this point was the least effective)

After sending 500 000 packets we have following measurements:

Avg: 4.52ms
Min: 3.12ms
Max: 12.24ms

This is still multiple times slower than time limits we need to meet, and average response time should be almost ten times lower to compete with Linux.

Finding out the reason

One of the possible sources for slow data processing may be other processes who use system resources, but in this case nothing else is running.

May be there are too much interrupts from some peripherals? But that’s not the case too: we only process network and timer interrupts; first ones are necessary to process ethernet frames and second ones do not tend to do any effect: if timer goes slower, response time doesn’t decrease anyway.

Eventually we have found out that high latency was caused by low link speed: we used 100 Mbit/s USB-to-ethernet adapter; net driver didn’t support 1Gbit/s link too.

After patching driver and replacing ethernet adapter with faster one we’re getting following results:

Avg: 0.08ms
Min: 0.07ms
Max: 4.31ms

Linux comparison

As we are using POSIX-compatible application for our measurements, it’s very easy to cross-build it for Linux:

arm-linux-gnueabihf-gcc server.c -O2 , which builds ELF file.

Running with the same client on the host side:

Avg: 0.77ms
Min: 0.74ms
Max: 5.31ms

As you can see, in this test Embox is able to respond almost 9 times faster than Linux, which is a pretty good result.

Dispersion

While average response time is pretty good, maximum time kills the positive effect for two reasons:

  • Of course it’s long enough to fail time limit, but even more importantly
  • It creates significant uncertainty to system behavior

How can you investigate the reason for such dispersion? We decided to start with measuring time which takes ethernet frame to be fully processed between receiving and responding. It was possible to collect statistics on the development board for future analyzing, but it’s much simpler just to send this data in the UDP packet itself and process in on the desktop computer.

Time of packet receive time is written to some variable inside interrupt handler, send time is written just before activating netcard DMA.

int net_tx(…) {
  if (is_udp_packet()) {
    timestamp2 = timer_get();
    memcpy(packet[UDP_OFFT],
            &timestamp1,
            sizeof(timestamp1));
    memcpy(packet[UDP_OFFT + sizeof(timestamp2)],
            &timestamp2,
            sizeof(timestamp2));
    …
  }
}

This time we got following results:

Avg: 8673
Min: 6191 
Max: 11950

It turned out that dispersion for Embox processing UDP packet is not big at all: it’s just about 25% which hardly explains final 5000% dispersion (Avg: 0.08ms Max: 4.31ms).

Even if Embox will process every UDP packet with the same time, it may reduce it by just 1/4 which still will be too much, so we have started to find out another reason for such behavior.

What if problem is on the other side?

So now we have two potential problems:

  • Hardware issues
  • Linux host latency

It’s much harder to solve the first problem, so in hope that it’s not the case we started to think how to solve this problem.

How do we check it?

First of all, we can just try to set highest priority to this test on the host system.

nice -n -20 ./client

However, this didn’t have any significant effect. It seemed that average time reduced slightly, but still was too small compared to big dispersion.

Another solution is to change scheduling policy to round-robin. You can do it with chrt command like this:

chrt --rr 99 ./client

Finally, it worked!

Number of “slow” responses has decreased dramatically. This histogram shows difference for round-robin and regular scheduling:

Reducing UDP Latency

Other ways to reduce latency for Linux host

  • Using raw sockets. It’s not exactly the same task, but if you really need lowest possible latency, it’s probably not that good idea to use UDP at all :)
  • Interrupt coalescing may increase network latency, so it can be helpful to turn it off
  • You can use libpcap and TPACKETv3 supported by Linux kernel. Speed increase is being achieved by removing the overhead for copying from kernel space to user space. pcap also allows to apply packet filtering
  • XDP or eXpress Data Path is a BPF-like project which allows to lower overhead too
  • Some other ways are considered in this Cloudflare blogpost

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

UX设计之道

UX设计之道

[美] 昴格尔、[美] 钱德勒 / 孙亮 / 人民邮电出版社 / 2010-4 / 35.00元

《UX设计之道:以用户体验为中心的Web设计》可以看作是用户体验设计的核心参考书。无论是Web设计领域的创业者、项目管理者还是用户体验的策划者和实施者,或是有志于Web设计领域的学生,都应该了解《UX设计之道:以用户体验为中心的Web设计》中的知识。 用户是网站的根本,网站要达到自己的商业目标,必须满足目标用户的需求——这就是以用户体验为中心的网站设计。那么,用户需求从何而来?如何将用户需......一起来看看 《UX设计之道》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具