netdev - Re: [PATCH v4] ax25: Fix refcount imbalance on inbound connections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEoi9W4vRzeASj=5XWqL-BrkD5wbh2XFGJcUXUiQcCr+7Ai3Lw@mail.gmail.com>
Date: Thu, 23 May 2024 16:39:27 -0400
From: Dan Cross <crossd@...il.com>
To: Dan Carpenter <dan.carpenter@...aro.org>
Cc: lars@...bit.com, Duoming Zhou <duoming@....edu.cn>, linux-hams@...r.kernel.org, 
	netdev@...r.kernel.org
Subject: Re: [PATCH v4] ax25: Fix refcount imbalance on inbound connections

On Thu, May 23, 2024 at 2:23 PM Dan Carpenter <dan.carpenter@...aro.org> wrote:
> On Thu, May 23, 2024 at 11:22:43AM -0400, Dan Cross wrote:
> > On Thu, May 23, 2024 at 11:05 AM Dan Carpenter <dan.carpenter@...aro.org> wrote:
> > > > [snip]
> > >
> > > I've already said that I don't think the patch is correct and offered
> > > an alternative which takes a reference in accept() but also adds a
> > > matching put()...  But I can't really test my patch so if we're going to
> > > do something that we know is wrong, I'd prefer to just revert Duoming's
> > > patch.
> >
> > Dan, may I ask how you determined that Lars's patch is incorrect?
>
> The problem is that accept() and ax25_release() are not mirrored pairs.

I'm having a hard time understanding why. Here's my reasoning; please
correct me if I'm wrong?

Taking a step back, the semantics of `accept` are that, on successful
completion, it creates a new socket associated with the accepted
connection. It makes sense that such a new socket would take a
reference on the underlying device, since the socket is inherently
tied to that device; this is what Lars's patch does. Indeed, consider
the case that a connection was accepted, and then the bound listening
socket was immediately closed, thus dropping the reference on the
device: it seems that adding a reference onto the device in the
`accept` path is necessary.

So how does `ax25_release` get called? That ends up getting invoked
from `close`; I traced this through the kernel from `sys_close` until
the invocation of the `.release` function from the `proto_ops` vector.
The call graph looks something like this:

sys_close (fs/open.c)
 -> file_close_fd (fs/file.c)
  -> file_closed_fd_locked(same)
  <- returns struct file to file_close_fd
 <- returns struct file to sys_close
 -> filp_flush (fs/open.c)
  -> ops vec `.flush` (nop for socket)
 -> __fput_sync (fs/file_table.c):  decref(f_count) => __fput
  -> __fput (fs/file_table.c)
   -> fsnotify_close(file) (irrelevant)
   -> f_op->release
    -> sock_close (net/socket.c)
     -> __sock_release (net/socket.c)
      -> proto_ops vec `.release`
       -> ax25_release (net/ax25/af_ax25.c)

There may be other ways it's invoked, but that's likely the main one.
It seems clear that this will happen for sockets that have a ref on
the device either via `bind` or via `accept`.

> We're just taking the reference and never dropping it. Which fixes the
> use after free but introduces a leak.

I'm not sure that's true. It looks to me like the ref is dropped when
the accepted socket is eventually closed. What am I missing?

> > Testing so far indicates that it works as expected. On the other hand,
> > Lars tested your patch and found that it did not address the
> > underlying issue
> > (https://marc.info/?l=linux-hams&m=171646940902757&w=2).
>
> Yeah.  I've said a couple times that my patch wasn't complete.  I keep
> hoping that Duoming will chime in here...

+1!

> > If I may suggest a path forward, given that observed results show that
> > Lars's patch works as expected, perhaps we can commit that and then
> > work to incorporate a more robust ref counting strategy a la your
> > patch?
>
> The argument for this patch is that it works in testing even though we
> think it's not totally correct.  That's not really a good argument.
> Like we can revert patches that clearly don't work so we could revert
> Duoming's patch, but when we're adding code then that should work.

I agree that absence of evidence is not evidence of absence, but I'm
not sure I follow the reasoning behind their being a leak. It's not
clear to me that Lars's patch is obviously wrong. I _do_ think this
code hasn't been shown much love in a long time, and I am totally
prepared to admit that I'm wrong, but right now, I don't see it?

        - Dan C.