[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2f1635d9300a4bec8a0422e9e9518751@AcuMS.aculab.com>
Date: Wed, 27 Nov 2019 17:30:00 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Paolo Abeni' <pabeni@...hat.com>,
Jesper Dangaard Brouer <brouer@...hat.com>
CC: 'Marek Majkowski' <marek@...udflare.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
network dev <netdev@...r.kernel.org>,
kernel-team <kernel-team@...udflare.com>
Subject: RE: epoll_wait() performance
From: Paolo Abeni
> Sent: 27 November 2019 16:27
...
> @David: If I read your message correctly, the pkt rate you are dealing
> with is quite low... are we talking about tput or latency? I guess
> latency could be measurably higher with recvmmsg() in respect to other
> syscall. How do you measure the releative performances of recvmmsg()
> and recv() ? with micro-benchmark/rdtsc()? Am I right that you are
> usually getting a single packet per recvmmsg() call?
The packet rate per socket is low, typically one packet every 20ms.
This is RTP, so telephony audio.
However we have a lot of audio channels and hence a lot of sockets.
So there are can be 1000s of sockets we need to receive the data from.
The test system I'm using has 16 E1 TDM links each of which can handle
31 audio channels.
Forwarding all these to/from RTP (one of the things it might do) is 496
audio channels - so 496 RTP sockets and 496 RTCP ones.
Although the test I'm doing is pure RTP and doesn't use TDM.
What I'm measuring is the total time taken to receive all the packets
(on all the sockets) that are available to be read every 10ms.
So poll + recv + add_to_queue.
(The data processing is done by other threads.)
I use the time difference (actually CLOCK_MONOTONIC - from rdtsc)
to generate a 64 entry (self scaling) histogram of the elapsed times.
Then look for the histograms peak value.
(I need to work on the max value, but that is a different (more important!) problem.)
Depending on the poll/recv method used this takes 1.5 to 2ms
in each 10ms period.
(It is faster if I run the cpu at full speed, but it usually idles along
at 800MHz.)
If I use recvmmsg() I only expect to see one packet because there
is (almost always) only one packet on each socket every 20ms.
However there might be more than one, and if there is they
all need to be read (well at least 2 of them) in that block of receives.
The outbound traffic goes out through a small number of raw sockets.
Annoyingly we have to work out the local IPv4 address that will be used
for each destination in order to calculate the UDP checksum.
(I've a pending patch to speed up the x86 checksum code on a lot of
cpus.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists