[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <04c02c854f030d3cb0591cd420f28c57994a0aaa.camel@redhat.com>
Date: Mon, 04 Mar 2024 17:18:32 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski
<kuba@...nel.org>
Cc: Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org
Subject: Re: [PATCH v4 net-next 00/15] af_unix: Rework GC.
On Thu, 2024-02-29 at 18:22 -0800, Kuniyuki Iwashima wrote:
> When we pass a file descriptor to an AF_UNIX socket via SCM_RIGTHS,
> the underlying struct file of the inflight fd gets its refcount bumped.
> If the fd is of an AF_UNIX socket, we need to track it in case it forms
> cyclic references.
>
> Let's say we send a fd of AF_UNIX socket A to B and vice versa and
> close() both sockets.
>
> When created, each socket's struct file initially has one reference.
> After the fd exchange, both refcounts are bumped up to 2. Then, close()
> decreases both to 1. From this point on, no one can touch the file/socket.
>
> However, the struct file has one refcount and thus never calls the
> release() function of the AF_UNIX socket.
>
> That's why we need to track all inflight AF_UNIX sockets and run garbage
> collection.
>
> This series replaces the current GC implementation that locks each inflight
> socket's receive queue and requires trickiness in other places.
>
> The new GC does not lock each socket's queue to minimise its effect and
> tries to be lightweight if there is no cyclic reference or no update in
> the shape of the inflight fd graph.
>
> The new implementation is based on Tarjan's Strongly Connected Components
> algorithm, and we will consider each inflight AF_UNIX socket as a vertex
> and its file descriptor as an edge in a directed graph.
>
> For the details, please see each patch.
>
> patch 1 - 3 : Add struct to express inflight socket graphs
> patch 4 : Optimse inflight fd counting
> patch 5 - 6 : Group SCC possibly forming a cycle
> patch 7 - 8 : Support embryo socket
> patch 9 - 11 : Make GC lightweight
> patch 12 - 13 : Detect dead cycle references
> patch 14 : Replace GC algorithm
> patch 15 : selftest
>
> After this series is applied, we can remove the two ugly tricks for race,
> scm_fp_dup() in unix_attach_fds() and spin_lock dance in unix_peek_fds()
> as done in patch 14/15 of v1.
I plan to have a better look tomorrow.
Generally speaking we have a timing problem. This looks great, but also
quite complex, and thus the potential for untrivial regressions.
We are very late in this cycle and likely there will not be rc8, would
you be ok to eventually postpone this to the next cycle?
Thanks,
Paolo
Powered by blists - more mailing lists