linux-kernel - RE: [PATCH] netlink: introduce netlink poll to resolve fast return issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <25c501da111e$d527b010$7f771030$@samsung.com>
Date:   Tue, 7 Nov 2023 11:05:08 +0900
From:   "Jong eon Park" <jongeon.park@...sung.com>
To:     "'Jakub Kicinski'" <kuba@...nel.org>,
        "'Paolo Abeni'" <pabeni@...hat.com>
Cc:     "'David S. Miller'" <davem@...emloft.net>,
        "'Eric Dumazet'" <edumazet@...gle.com>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>,
        "'Dong ha Kang'" <dongha7.kang@...sung.com>
Subject: RE: [PATCH] netlink: introduce netlink poll to resolve fast return
 issue



> -----Original Message-----
> From: Jakub Kicinski <kuba@...nel.org>
> Sent: Tuesday, November 7, 2023 8:48 AM
> To: Jong eon Park <jongeon.park@...sung.com>; Paolo Abeni
> <pabeni@...hat.com>
> Cc: David S. Miller <davem@...emloft.net>; Eric Dumazet
> <edumazet@...gle.com>; netdev@...r.kernel.org; linux-
> kernel@...r.kernel.org; Dong ha Kang <dongha7.kang@...sung.com>
> Subject: Re: [PATCH] netlink: introduce netlink poll to resolve fast
> return issue
> 
> On Fri,  3 Nov 2023 16:22:09 +0900 Jong eon Park wrote:
> > In very rare cases, there was an issue where a user's poll function
> > waiting for a uevent would continuously return very quickly, causing
> > excessive CPU usage due to the following scenario.
> >
> > Once sk_rcvbuf becomes full netlink_broadcast_deliver returns an error
> > and netlink_overrun is called. However, if netlink_overrun was called
> > in a context just before a another context returns from the poll and
> > recv is invoked, emptying the rcvbuf, sk->sk_err = ENOBUF is written
> > to the netlink socket belatedly and it enters the NETLINK_S_CONGESTED
> state.
> > If the user does not check for POLLERR, they cannot consume and clean
> > sk_err and repeatedly enter the situation where they call poll again
> > but return immediately.
> >
> > To address this issue, I would like to introduce the following netlink
> > poll.
> >
> > After calling the datagram_poll, netlink poll checks the
> > NETLINK_S_CONGESTED status and rcv queue, and this make the user to be
> > readable once more even if the user has already emptied rcv queue.
> > This allows the user to be able to consume sk->sk_err value through
> > netlink_recvmsg, thus the situation described above can be avoided
> 
> The explanation makes sense, but I'm not able to make the jump in
> understanding how this is a netlink problem. datagram_poll() returns
> EPOLLERR because sk_err is set, what makes netlink special?
> The fact that we can have an sk_err with nothing in the recv queue?
> 
> Paolo understands this better, maybe he can weigh in tomorrow...

Perhaps my explanation was not comprehensive enough.

The issue at hand is that once it occurs, users cannot escape from this 
"busy running" situation, and the inadequate handling of EPOLLERR by users 
imposes a heavy burden on the entire system, which seems quite harsh.

The reason for a separate netlink poll is related to the netlink state. 
When it enters the NETLINK_S_CONGESTED state, sk can no longer receive or 
deliver skb, and the receive_queue must be completely emptied to clear the 
state. However, it was found that the NETLINK_S_CONGESTED state was still 
maintained even when the receive_queue was empty, which was incorrect, and 
that's why I implemented the handling in poll.

I don't consider this approach to be the best way, so if you have any 
recommendations for a better solution, I would appreciate it.

Regards.
JE Park.