linux-kernel - Re: [PATCH net-next v2] netlink: introduce netlink poll to resolve fast return issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d599922fd89b3e61c7cf531a03ea8b81cbcb003e.camel@redhat.com>
Date:   Wed, 15 Nov 2023 15:49:16 +0100
From:   Paolo Abeni <pabeni@...hat.com>
To:     Jong eon Park <jongeon.park@...sung.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        dongha7.kang@...sung.com
Subject: Re: [PATCH net-next v2] netlink: introduce netlink poll to resolve
 fast return issue

Hi,

I'm sorry for the delayed feedback.

On Tue, 2023-11-14 at 18:07 +0900, Jong eon Park wrote:
> In very rare cases, there was an issue where a user's 'poll' function
> waiting for a uevent would continuously return very quickly, causing
> excessive CPU usage due to the following scenario.
> 
> When sk_rmem_alloc exceeds sk_rcvbuf, netlink_broadcast_deliver returns an
> error and netlink_overrun is called. However, if netlink_overrun was
> called in a context just before a another context returns from the 'poll'
> and 'recv' is invoked, emptying the rcv queue, sk->sk_err = ENOBUF is
> written to the netlink socket belatedly and it enters the
> NETLINK_S_CONGESTED state. If the user does not check for POLLERR, they
> cannot consume and clean sk_err and repeatedly enter the situation where
> they call 'poll' again but return immediately. Moreover, in this
> situation, rcv queue is already empty and NETLINK_S_CONGESTED flag
> prevents any more incoming packets. This makes it impossible for the user
> to call 'recv'.
> 
> This "congested" situation is a bit ambiguous. The queue is empty, yet
> 'congested' remains. This means kernel can no longer deliver uevents
> despite the empty queue, and it lead to the persistent 'congested' status.
> 
> ------------CPU1 (kernel)----------  --------------CPU2 (app)--------------
> ...
> a driver delivers uevent.            poll was waiting for schedule.
> a driver delivers uevent.
> a driver delivers uevent.
> ...
> 1) netlink_broadcast_deliver fails.
> (sk_rmem_alloc > sk_rcvbuf)
>                                       getting schedule and poll returns,
>                                       and the app calls recv.
>                                       (rcv queue is empied)
>                                       2)
> 
> netlink_overrun is called.
> (NETLINK_S_CONGESTED flag is set,
> ENOBUF is written in sk_err and,
> wake up poll.)
>                                       finishing its job and call poll.
>                                       poll returns POLLERR.
> 
>                                       (the app doesn't have POLLERR handler)
>                                       it calls poll, but getting POLLERR.
>                                       it calls poll, but getting POLLERR.
>                                       it calls poll, but getting POLLERR.
>                                       ...
> 
> To address this issue, I would like to introduce the following netlink
> poll.

IMHO the above is an application bug, and should not be addressed in
the kernel.

If you want to limit the amount of CPU time your application could use,
you have to resort to process scheduler setting and/or container
limits: nothing could prevent a [buggy?] application from doing:

# in shell script
while true; do :; done

The above condition is IMHO not very different from the above: the
application is requesting POLLERR event and not processing them.

To more accurate is like looping on poll() getting read event without
reading any data. Nothing we should address in the kernel.

Cheers,

Paolo