netdev - Re: [PATCH] selinux,xfrm: fix dangling refcount on deferred skb free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFqZXNumyhpRvrZ6mAK9OVxbte=_3MG195i_+Z1j79PsE=6k_g@mail.gmail.com>
Date: Wed, 6 Nov 2024 17:54:43 +0100
From: Ondrej Mosnacek <omosnace@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Paul Moore <paul@...l-moore.com>, Steffen Klassert <steffen.klassert@...unet.com>, 
	Herbert Xu <herbert@...dor.apana.org.au>, "David S. Miller" <davem@...emloft.net>, 
	Stephen Smalley <stephen.smalley.work@...il.com>, selinux@...r.kernel.org, 
	linux-security-module@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH] selinux,xfrm: fix dangling refcount on deferred skb free

On Wed, Nov 6, 2024 at 5:13 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Nov 6, 2024 at 4:55 PM Ondrej Mosnacek <omosnace@...hat.com> wrote:
> >
> > SELinux tracks the number of allocated xfrm_state/xfrm_policy objects
> > (via the selinux_xfrm_refcount variable) as an input in deciding if peer
> > labeling should be used.
> >
> > However, as a result of commits f35f821935d8 ("tcp: defer skb freeing
> > after socket lock is released") and 68822bdf76f1 ("net: generalize skb
> > freeing deferral to per-cpu lists"), freeing of a sk_buff object, which
> > may hold a reference to an xfrm_state object, can be deferred for
> > processing on another CPU core, so even after xfrm_state is deleted from
> > the configuration by userspace, the refcount isn't decremented until the
> > deferred freeing of relevant sk_buffs happens. On a system with many
> > cores this can take a very long time (even minutes or more if the system
> > is not very active), leading to peer labeling being enabled for much
> > longer than expected.
> >
> > Fix this by moving the selinux_xfrm_refcount decrementing to just after
> > the actual deletion of the xfrm objects rather than waiting for the
> > freeing to happen. For xfrm_policy it currently doesn't seem to be
> > necessary, but let's do the same there for consistency and
> > future-proofing.
> >
> > We hit this issue on a specific aarch64 256-core system, where the
> > sequence of unix_socket/test and inet_socket/tcp/test from
> > selinux-testsuite [1] would quite reliably trigger this scenario, and a
> > subsequent sctp/test run would then stumble because the policy for that
> > test misses some rules that would make it work under peer labeling
> > enabled (namely it was getting the netif::egress permission denied in
> > some of the test cases).
> >
> > [1] https://github.com/SELinuxProject/selinux-testsuite/
> >
> > Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released")
> > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
> > Signed-off-by: Ondrej Mosnacek <omosnace@...hat.com>
> > ---
>
> Can we explain why TCP packets sitting in TCP receive queues would
> need to keep xfrm_state around ?
>
> With thousands of TCP sockets. I would imagine that a similar issue
> would be hit,
> regardless of f35f821935d8 ("tcp: defer skb freeing after socket lock
> is released") and 68822bdf76f1 ("net: generalize skb freeing deferral
> to per-cpu lists")
>
> We remove the dst from these incoming packets (see skb_dst_drop() in
> tcp_data_queue() and tcp_add_backlog()),
> I do not see how XFRM state could be kept ?

The problem is not with the xfrm_state reference via dst_entry, but
the one in skb_ext (skb->extensions) -> sec_path
(skb_ext_get_ptr(skb->extensions, SKB_EXT_SEC_PATH)) -> the xvec
array. But you have a point that I should say that in the commit
message...

And I think you're right that even without those commits a delay in
processing the RCU free callbacks could cause a similar issue, it just
wouldn't be as easy to trigger. That made me look deeper into history
which commit actually added the decrement on free and it turns out it
was done intentionally as a bugfix - see commit e4e8536f65b5
("selinux: fix the labeled xfrm/IPsec reference count handling").
Before that commit the logic was similar to what my patch is doing, so
I could be re-introducing another bug here :-/ The commit message is
not very helpful there - Paul, do you happen to remember what the
issue was that prompted it? I guess there can be some alloc's that
won't have a matching delete in the right circumstances? Or something
involving the selinux_xfrm_policy_clone() case?

-- 
Ondrej Mosnacek
Senior Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.