[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250512075302.GA3644468@google.com>
Date: Mon, 12 May 2025 08:53:02 +0100
From: Lee Jones <lee@...nel.org>
To: Greg KH <gregkh@...uxfoundation.org>
Cc: Kuniyuki Iwashima <kuniyu@...zon.com>, Paolo Abeni <pabeni@...hat.com>,
kuni1840@...il.com, llfamsec@...il.com, netdev@...r.kernel.org,
sashal@...nel.org, stable@...r.kernel.org
Subject: Re: [PATCH stable 5.15/6.1/6.6] af_unix: Clear oob_skb in
scan_inflight().
On Wed, 05 Mar 2025, Greg KH wrote:
> On Wed, Mar 05, 2025 at 10:10:41AM -0800, Kuniyuki Iwashima wrote:
> > +Paolo
> >
> > From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> > Date: Wed, 5 Mar 2025 15:08:26 +0100
> > > On Mon, Mar 03, 2025 at 07:01:49PM -0800, Kuniyuki Iwashima wrote:
> > > > Embryo socket is not queued in gc_candidates, so we can't drop
> > > > a reference held by its oob_skb.
> > > >
> > > > Let's say we create listener and embryo sockets, send the
> > > > listener's fd to the embryo as OOB data, and close() them
> > > > without recv()ing the OOB data.
> > > >
> > > > There is a self-reference cycle like
> > > >
> > > > listener -> embryo.oob_skb -> listener
> > > >
> > > > , so this must be cleaned up by GC. Otherwise, the listener's
> > > > refcnt is not released and sockets are leaked:
> > > >
> > > > # unshare -n
> > > > # cat /proc/net/protocols | grep UNIX-STREAM
> > > > UNIX-STREAM 1024 0 -1 NI 0 yes kernel ...
> > > >
> > > > # python3
> > > > >>> from array import array
> > > > >>> from socket import *
> > > > >>>
> > > > >>> s = socket(AF_UNIX, SOCK_STREAM)
> > > > >>> s.bind('\0test\0')
> > > > >>> s.listen()
> > > > >>>
> > > > >>> c = socket(AF_UNIX, SOCK_STREAM)
> > > > >>> c.connect(s.getsockname())
> > > > >>> c.sendmsg([b'x'], [(SOL_SOCKET, SCM_RIGHTS, array('i', [s.fileno()]))], MSG_OOB)
> > > > 1
> > > > >>> quit()
> > > >
> > > > # cat /proc/net/protocols | grep UNIX-STREAM
> > > > UNIX-STREAM 1024 3 -1 NI 0 yes kernel ...
> > > > ^^^
> > > > 3 sockets still in use after FDs are close()d
> > > >
> > > > Let's drop the embryo socket's oob_skb ref in scan_inflight().
> > > >
> > > > This also fixes a racy access to oob_skb that commit 9841991a446c
> > > > ("af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue
> > > > lock.") fixed for the new Tarjan's algo-based GC.
> > > >
> > > > Fixes: 314001f0bf92 ("af_unix: Add OOB support")
> > > > Reported-by: Lei Lu <llfamsec@...il.com>
> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> > > > ---
> > > > This has no upstream commit because I replaced the entire GC in
> > > > 6.10 and the new GC does not have this bug, and this fix is only
> > > > applicable to the old GC (<= 6.9), thus for 5.15/6.1/6.6.
> > >
> > > You need to get the networking maintainers to review and agree that this
> > > is ok for us to take, as we really don't want to take "custom" stuff
> > > like thi s at all.
> >
> > Paolo, could you take a look at this patch ?
> > https://lore.kernel.org/netdev/20250304030149.82265-1-kuniyu@amazon.com/
> >
> >
> > > Why not just take the commits that are in newer
> > > kernels instead?
> >
> > That will be about 20 patches that rewrite the most lines of
> > net/unix/garbage.c and cannot be applied cleanly.
> >
> > I think backporting these commits is overkill to fix a small
> > bug that can be fixed with a much smaller diff.
> >
> > 927fa5b3e4f5 af_unix: Fix uninit-value in __unix_walk_scc()
> > 041933a1ec7b af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
> > 7172dc93d621 af_unix: Add dead flag to struct scm_fp_list.
> > 1af2dface5d2 af_unix: Don't access successor in unix_del_edges() during GC.
> > fd86344823b5 af_unix: Try not to hold unix_gc_lock during accept().
> > 118f457da9ed af_unix: Remove lock dance in unix_peek_fds().
> > 4090fa373f0e af_unix: Replace garbage collection algorithm.
> > a15702d8b3aa af_unix: Detect dead SCC.
> > bfdb01283ee8 af_unix: Assign a unique index to SCC.
> > ad081928a8b0 af_unix: Avoid Tarjan's algorithm if unnecessary.
> > 77e5593aebba af_unix: Skip GC if no cycle exists.
> > ba31b4a4e101 af_unix: Save O(n) setup of Tarjan's algo.
> > dcf70df2048d af_unix: Fix up unix_edge.successor for embryo socket.
> > 3484f063172d af_unix: Detect Strongly Connected Components.
> > 6ba76fd2848e af_unix: Iterate all vertices by DFS.
> > 22c3c0c52d32 af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.
> > 42f298c06b30 af_unix: Link struct unix_edge when queuing skb.
> > 29b64e354029 af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd.
> > 1fbfdfaa5902 af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd.
>
> Sure, but now all fixes made upstream after these changes will not apply
> to older kernels at all, making supporting this old one-off change
> harder and harder over time.
>
> But I'll defer to the maintainers here as to what they want. Taking 20+
> patches in a stable tree is trivial for us, not a problem at all.
Since the general expectation is that branches will be maintained for a
long time, typically around 4 years, it could be seen as shortsighted to
apply a drive-by fix which is highly likely to cause merge conflicts
going forward. In order to ensure ease of future maintenance it would
be nicer to apply all of the updates above, which as Greg says, would be
trivial from the perspective of Stable.
Kuniyuki, has the suggested stack above been fully tested?
--
Lee Jones [李琼斯]
Powered by blists - more mailing lists