netdev - Re: [PATCH net] udp: Allow kernel service to avoid udp socket rx queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e7e3e30374b1b95f7f9a945c765ff523dfda3016.camel@redhat.com>
Date:   Thu, 04 Oct 2018 11:08:11 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     David Howells <dhowells@...hat.com>, netdev@...r.kernel.org
Cc:     linux-afs@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] udp: Allow kernel service to avoid udp socket rx
 queue

Hi,

On Wed, 2018-10-03 at 19:48 +0100, David Howells wrote:
> There's a problem somewhere skb_recv_udp() that doesn't decrease the
> sk_rmem_alloc counter when a packet is extracted from the receive queue by
> a kernel service.  

If this is the case, it's really bad and need an explicit fix. However
it looks like sk_rmem_alloc is reclaimed by skb_recv_udp(), as it ends-
up calling udp_rmem_release() on succesfull dequeue.

udp_rmem_release() can delay sk_rmem_alloc reclaiming is the rx queue
is almost empty, due to commit
6b229cf77d683f634f0edd876c6d1015402303ad. Anyhow I don't see either as
that may affect this scenario: if only 1/4 of the maxium receive buffer
size is used, the next packet should always be able to land into the rx
queue, with the default sk_rcvbuf and every reasonable truesize.

> Further, there doesn't seem any point in having the socket buffer being
> added to the UDP socket's rx queue since the rxrpc's data_ready handler
> takes it straight back out again (more or less, there seem to be occasional
> hold ups there).

I really would really try to avoid adding another indirect call in the
data-path, unless strictly needed (to avoid more RETPOLINE overhead for
all other use-case). If skipping altogether the enqueuing makes sense
(I guess so, mostily for performance reasons), I *think* you can use
the already existing encap_rcv hook, initializing it to the rxrpc input
function, and updating such function to pull the udp header and ev.
initializing the pktinfo, if needed. Please see e.g. l2tp usage.

> Putting in some tracepoints show a significant delay occurring between packets
> coming in and thence being delivered to rxrpc:
> 
> <idle>-0  [001] ..s2    67.631844: net_rtl8169_napi_rx: enp3s0 skb=07db0a32
> ...
> <idle>-0  [001] ..s4    68.292778: rxrpc_rx_packet: d5ce8d37:bdb93c60:00000002:09c7 00000006 00000000 02 20 ACK 660967981 skb=07db0a32
> 
> The "660967981" is the time difference in nanoseconds between the sk_buff
> timestamp and the current time.  It seems to match the time elapsed between
> the two trace lines reasonably well.  I've seen anything up to about 4s.

Can you please provide more data? specifically can you please add:
* a perf probe in rxrpc_data_ready() just after skb_recv_udp()
reporting the sk->sk_rmem_alloc and skb->truesize
* a perf probe in __udp_enqueue_schedule_skb() just before the 'if
(rmem > sk->sk_rcvbuf)' test reporting again sk->sk_rmem_alloc, skb-
>truesize and sk->sk_rcvbuf

And then provide the perf record -g -e ... /perf script output?

Thanks,

Paolo