lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iJj2sQX3rZvmbL0zQjX7K-OFwyautgAXqxNvk2M17++bw@mail.gmail.com>
Date: Wed, 6 Nov 2024 18:00:14 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Ondrej Mosnacek <omosnace@...hat.com>
Cc: Paul Moore <paul@...l-moore.com>, Steffen Klassert <steffen.klassert@...unet.com>, 
	Herbert Xu <herbert@...dor.apana.org.au>, "David S. Miller" <davem@...emloft.net>, 
	Stephen Smalley <stephen.smalley.work@...il.com>, selinux@...r.kernel.org, 
	linux-security-module@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH] selinux,xfrm: fix dangling refcount on deferred skb free

On Wed, Nov 6, 2024 at 5:54 PM Ondrej Mosnacek <omosnace@...hat.com> wrote:
>
> On Wed, Nov 6, 2024 at 5:13 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Wed, Nov 6, 2024 at 4:55 PM Ondrej Mosnacek <omosnace@...hat.com> wrote:
> > >
> > > SELinux tracks the number of allocated xfrm_state/xfrm_policy objects
> > > (via the selinux_xfrm_refcount variable) as an input in deciding if peer
> > > labeling should be used.
> > >
> > > However, as a result of commits f35f821935d8 ("tcp: defer skb freeing
> > > after socket lock is released") and 68822bdf76f1 ("net: generalize skb
> > > freeing deferral to per-cpu lists"), freeing of a sk_buff object, which
> > > may hold a reference to an xfrm_state object, can be deferred for
> > > processing on another CPU core, so even after xfrm_state is deleted from
> > > the configuration by userspace, the refcount isn't decremented until the
> > > deferred freeing of relevant sk_buffs happens. On a system with many
> > > cores this can take a very long time (even minutes or more if the system
> > > is not very active), leading to peer labeling being enabled for much
> > > longer than expected.
> > >
> > > Fix this by moving the selinux_xfrm_refcount decrementing to just after
> > > the actual deletion of the xfrm objects rather than waiting for the
> > > freeing to happen. For xfrm_policy it currently doesn't seem to be
> > > necessary, but let's do the same there for consistency and
> > > future-proofing.
> > >
> > > We hit this issue on a specific aarch64 256-core system, where the
> > > sequence of unix_socket/test and inet_socket/tcp/test from
> > > selinux-testsuite [1] would quite reliably trigger this scenario, and a
> > > subsequent sctp/test run would then stumble because the policy for that
> > > test misses some rules that would make it work under peer labeling
> > > enabled (namely it was getting the netif::egress permission denied in
> > > some of the test cases).
> > >
> > > [1] https://github.com/SELinuxProject/selinux-testsuite/
> > >
> > > Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released")
> > > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
> > > Signed-off-by: Ondrej Mosnacek <omosnace@...hat.com>
> > > ---
> >
> > Can we explain why TCP packets sitting in TCP receive queues would
> > need to keep xfrm_state around ?
> >
> > With thousands of TCP sockets. I would imagine that a similar issue
> > would be hit,
> > regardless of f35f821935d8 ("tcp: defer skb freeing after socket lock
> > is released") and 68822bdf76f1 ("net: generalize skb freeing deferral
> > to per-cpu lists")
> >
> > We remove the dst from these incoming packets (see skb_dst_drop() in
> > tcp_data_queue() and tcp_add_backlog()),
> > I do not see how XFRM state could be kept ?
>
> The problem is not with the xfrm_state reference via dst_entry, but
> the one in skb_ext (skb->extensions) -> sec_path
> (skb_ext_get_ptr(skb->extensions, SKB_EXT_SEC_PATH)) -> the xvec
> array. But you have a point that I should say that in the commit
> message...
>

So some secpath_reset() calls should be added in various protocol handlers
before packets are possibly queued for hours in a socket queue  ?

I see one in l2tp_eth_dev_recv().

> And I think you're right that even without those commits a delay in
> processing the RCU free callbacks could cause a similar issue, it just
> wouldn't be as easy to trigger. That made me look deeper into history
> which commit actually added the decrement on free and it turns out it
> was done intentionally as a bugfix - see commit e4e8536f65b5
> ("selinux: fix the labeled xfrm/IPsec reference count handling").
> Before that commit the logic was similar to what my patch is doing, so
> I could be re-introducing another bug here :-/ The commit message is
> not very helpful there - Paul, do you happen to remember what the
> issue was that prompted it? I guess there can be some alloc's that
> won't have a matching delete in the right circumstances? Or something
> involving the selinux_xfrm_policy_clone() case?
>
> --
> Ondrej Mosnacek
> Senior Software Engineer, Linux Security - SELinux kernel
> Red Hat, Inc.
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ