netdev - Re: [PATCH] selinux,xfrm: fix dangling refcount on deferred skb free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKag19EPvnQRthsG98pfjriRwtS+YND0359xFijGAoEYg@mail.gmail.com>
Date: Wed, 6 Nov 2024 17:13:14 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Ondrej Mosnacek <omosnace@...hat.com>
Cc: Paul Moore <paul@...l-moore.com>, Steffen Klassert <steffen.klassert@...unet.com>, 
	Herbert Xu <herbert@...dor.apana.org.au>, "David S. Miller" <davem@...emloft.net>, 
	Stephen Smalley <stephen.smalley.work@...il.com>, selinux@...r.kernel.org, 
	linux-security-module@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH] selinux,xfrm: fix dangling refcount on deferred skb free

On Wed, Nov 6, 2024 at 4:55 PM Ondrej Mosnacek <omosnace@...hat.com> wrote:
>
> SELinux tracks the number of allocated xfrm_state/xfrm_policy objects
> (via the selinux_xfrm_refcount variable) as an input in deciding if peer
> labeling should be used.
>
> However, as a result of commits f35f821935d8 ("tcp: defer skb freeing
> after socket lock is released") and 68822bdf76f1 ("net: generalize skb
> freeing deferral to per-cpu lists"), freeing of a sk_buff object, which
> may hold a reference to an xfrm_state object, can be deferred for
> processing on another CPU core, so even after xfrm_state is deleted from
> the configuration by userspace, the refcount isn't decremented until the
> deferred freeing of relevant sk_buffs happens. On a system with many
> cores this can take a very long time (even minutes or more if the system
> is not very active), leading to peer labeling being enabled for much
> longer than expected.
>
> Fix this by moving the selinux_xfrm_refcount decrementing to just after
> the actual deletion of the xfrm objects rather than waiting for the
> freeing to happen. For xfrm_policy it currently doesn't seem to be
> necessary, but let's do the same there for consistency and
> future-proofing.
>
> We hit this issue on a specific aarch64 256-core system, where the
> sequence of unix_socket/test and inet_socket/tcp/test from
> selinux-testsuite [1] would quite reliably trigger this scenario, and a
> subsequent sctp/test run would then stumble because the policy for that
> test misses some rules that would make it work under peer labeling
> enabled (namely it was getting the netif::egress permission denied in
> some of the test cases).
>
> [1] https://github.com/SELinuxProject/selinux-testsuite/
>
> Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released")
> Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
> Signed-off-by: Ondrej Mosnacek <omosnace@...hat.com>
> ---

Can we explain why TCP packets sitting in TCP receive queues would
need to keep xfrm_state around ?

With thousands of TCP sockets. I would imagine that a similar issue
would be hit,
regardless of f35f821935d8 ("tcp: defer skb freeing after socket lock
is released") and 68822bdf76f1 ("net: generalize skb freeing deferral
to per-cpu lists")

We remove the dst from these incoming packets (see skb_dst_drop() in
tcp_data_queue() and tcp_add_backlog()),
I do not see how XFRM state could be kept ?