[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEoi9W4vRzeASj=5XWqL-BrkD5wbh2XFGJcUXUiQcCr+7Ai3Lw@mail.gmail.com>
Date: Thu, 23 May 2024 16:39:27 -0400
From: Dan Cross <crossd@...il.com>
To: Dan Carpenter <dan.carpenter@...aro.org>
Cc: lars@...bit.com, Duoming Zhou <duoming@....edu.cn>, linux-hams@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH v4] ax25: Fix refcount imbalance on inbound connections
On Thu, May 23, 2024 at 2:23 PM Dan Carpenter <dan.carpenter@...aro.org> wrote:
> On Thu, May 23, 2024 at 11:22:43AM -0400, Dan Cross wrote:
> > On Thu, May 23, 2024 at 11:05 AM Dan Carpenter <dan.carpenter@...aro.org> wrote:
> > > > [snip]
> > >
> > > I've already said that I don't think the patch is correct and offered
> > > an alternative which takes a reference in accept() but also adds a
> > > matching put()... But I can't really test my patch so if we're going to
> > > do something that we know is wrong, I'd prefer to just revert Duoming's
> > > patch.
> >
> > Dan, may I ask how you determined that Lars's patch is incorrect?
>
> The problem is that accept() and ax25_release() are not mirrored pairs.
I'm having a hard time understanding why. Here's my reasoning; please
correct me if I'm wrong?
Taking a step back, the semantics of `accept` are that, on successful
completion, it creates a new socket associated with the accepted
connection. It makes sense that such a new socket would take a
reference on the underlying device, since the socket is inherently
tied to that device; this is what Lars's patch does. Indeed, consider
the case that a connection was accepted, and then the bound listening
socket was immediately closed, thus dropping the reference on the
device: it seems that adding a reference onto the device in the
`accept` path is necessary.
So how does `ax25_release` get called? That ends up getting invoked
from `close`; I traced this through the kernel from `sys_close` until
the invocation of the `.release` function from the `proto_ops` vector.
The call graph looks something like this:
sys_close (fs/open.c)
-> file_close_fd (fs/file.c)
-> file_closed_fd_locked(same)
<- returns struct file to file_close_fd
<- returns struct file to sys_close
-> filp_flush (fs/open.c)
-> ops vec `.flush` (nop for socket)
-> __fput_sync (fs/file_table.c): decref(f_count) => __fput
-> __fput (fs/file_table.c)
-> fsnotify_close(file) (irrelevant)
-> f_op->release
-> sock_close (net/socket.c)
-> __sock_release (net/socket.c)
-> proto_ops vec `.release`
-> ax25_release (net/ax25/af_ax25.c)
There may be other ways it's invoked, but that's likely the main one.
It seems clear that this will happen for sockets that have a ref on
the device either via `bind` or via `accept`.
> We're just taking the reference and never dropping it. Which fixes the
> use after free but introduces a leak.
I'm not sure that's true. It looks to me like the ref is dropped when
the accepted socket is eventually closed. What am I missing?
> > Testing so far indicates that it works as expected. On the other hand,
> > Lars tested your patch and found that it did not address the
> > underlying issue
> > (https://marc.info/?l=linux-hams&m=171646940902757&w=2).
>
> Yeah. I've said a couple times that my patch wasn't complete. I keep
> hoping that Duoming will chime in here...
+1!
> > If I may suggest a path forward, given that observed results show that
> > Lars's patch works as expected, perhaps we can commit that and then
> > work to incorporate a more robust ref counting strategy a la your
> > patch?
>
> The argument for this patch is that it works in testing even though we
> think it's not totally correct. That's not really a good argument.
> Like we can revert patches that clearly don't work so we could revert
> Duoming's patch, but when we're adding code then that should work.
I agree that absence of evidence is not evidence of absence, but I'm
not sure I follow the reasoning behind their being a leak. It's not
clear to me that Lars's patch is obviously wrong. I _do_ think this
code hasn't been shown much love in a long time, and I am totally
prepared to admit that I'm wrong, but right now, I don't see it?
- Dan C.
Powered by blists - more mailing lists