lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 6 Nov 2023 15:48:12 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Jong eon Park <jongeon.park@...sung.com>, Paolo Abeni
 <pabeni@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, Dong ha Kang <dongha7.kang@...sung.com>
Subject: Re: [PATCH] netlink: introduce netlink poll to resolve fast return
 issue

On Fri,  3 Nov 2023 16:22:09 +0900 Jong eon Park wrote:
> In very rare cases, there was an issue where a user's poll function
> waiting for a uevent would continuously return very quickly, causing
> excessive CPU usage due to the following scenario.
> 
> Once sk_rcvbuf becomes full netlink_broadcast_deliver returns an error and
> netlink_overrun is called. However, if netlink_overrun was called in a
> context just before a another context returns from the poll and recv is
> invoked, emptying the rcvbuf, sk->sk_err = ENOBUF is written to the
> netlink socket belatedly and it enters the NETLINK_S_CONGESTED state.
> If the user does not check for POLLERR, they cannot consume and clean
> sk_err and repeatedly enter the situation where they call poll again but
> return immediately.
> 
> To address this issue, I would like to introduce the following netlink
> poll.
> 
> After calling the datagram_poll, netlink poll checks the
> NETLINK_S_CONGESTED status and rcv queue, and this make the user to be
> readable once more even if the user has already emptied rcv queue. This
> allows the user to be able to consume sk->sk_err value through
> netlink_recvmsg, thus the situation described above can be avoided

The explanation makes sense, but I'm not able to make the jump in
understanding how this is a netlink problem. datagram_poll() returns
EPOLLERR because sk_err is set, what makes netlink special?
The fact that we can have an sk_err with nothing in the recv queue?

Paolo understands this better, maybe he can weigh in tomorrow...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ