lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230113022555.2467724-1-kamatam@amazon.com>
Date:   Thu, 12 Jan 2023 18:25:55 -0800
From:   Munehisa Kamata <kamatam@...zon.com>
To:     <surenb@...gle.com>
CC:     <ebiggers@...nel.org>, <hannes@...xchg.org>, <hdanton@...a.com>,
        <kamatam@...zon.com>, <linux-kernel@...r.kernel.org>,
        <linux-mm@...ck.org>, <mengcc@...zon.com>
Subject: Re: another use-after-free in ep_remove_wait_queue()

On Thu, 2023-01-12 22:01:24 +0000, Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Mon, Jan 9, 2023 at 7:06 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
> >
> > On Mon, Jan 9, 2023 at 5:33 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
> > >
> > > On Sun, Jan 8, 2023 at 3:49 PM Hillf Danton <hdanton@...a.com> wrote:
> > > >
> > > > On 8 Jan 2023 14:25:48 -0800 PM Munehisa Kamata <kamatam@...zon.com> wrote:
> > > > >
> > > > > That patch survived the repro in my original post, however, the waker
> > > > > (rmdir) was getting stuck until a file descriptor of the epoll instance or
> > > > > the pressure file got closed. So, if the following modified repro runs
> > > > > with the patch, the waker never returns (unless the sleeper gets killed)
> > > > > while holding cgroup_mutex. This doesn't seem to be what you expected to
> > > > > see with the patch, does it? Even wake_up_all() does not appear to empty
> > > > > the queue, but wake_up_pollfree() does.
> > > >
> > > > Thanks for your testing. And the debugging completes.
> > > >
> > > > Mind sending a patch with wake_up_pollfree() folded?
> > >
> > > I finally had some time to look into this issue. I don't think
> > > delaying destruction in psi_trigger_destroy() because there are still
> > > users of the trigger as Hillf suggested is a good way to go. Before
> > > [1] correct trigger destruction was handled using a
> > > psi_trigger.refcount. For some reason I thought it's not needed
> > > anymore when we placed one-trigger-per-file restriction in that patch,
> > > so I removed it. Obviously that was a wrong move, so I think the
> > > cleanest way would be to bring back the refcounting. That way the last
> > > user of the trigger (either psi_trigger_poll() or psi_fop_release())
> > > will free the trigger.
> > > I'll check once more to make sure I did not miss anything and if there
> > > are no objections, will post a fix.
> >
> > Uh, I recalled now why refcounting was not helpful here. I'm making
> > the same mistake of thinking that poll_wait() blocks until the call to
> > wake_up() which is not the case. Let me think if there is anything
> > better than wake_up_pollfree() for this case.
> 
> Hi Munehisa,
> Sorry for the delay. I was trying to reproduce the issue but even
> after adding a delay before ep_remove_wait_queue() it did not happen.

Hi Suren,

Thank you for your help here.

Just in case, do you have KASAN enabled in your config? If not, this may
just silently corrupt a certain memory location and not immediately
followed by obvious messages or noticeable event like oops.

> One thing about wake_up_pollfree() solution that does not seem right
> to me is this comment at
> https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L253:
> 
> `In the very rare cases where a ->poll() implementation uses a
> waitqueue whose lifetime is tied to a task rather than to the 'struct
> file' being polled, this function must be called before the waitqueue
> is freed...`
> 
> In our case we free the waitqueue from cgroup_pressure_release(),
> which is the handler for `release` operation on cgroup psi files. The
> other place calling psi_trigger_destroy() is psi_fop_release(), which
> is also tied to the lifetime to the psi files.  Therefore the lifetime
> of the trigger's waitqueue is tied to the lifetime of the files and
> IIUC, we should not be required to use wake_up_pollfree().
> Could you please post your .config file? I might be missing some
> configuration which prevents the issue from happening on my side.

Sure, here is my config.

 https://gist.github.com/kamatam9/a078bdd9f695e7a0767b061c60e48d50

I confirmed that it's reliably reproducible with v6.2-rc3 as shown below.

 https://gist.github.com/kamatam9/096a79cf59d8ed8785c4267e917b8675


Regards,
Munehisa


> Thanks,
> Suren.
> 
> >
> >
> > >
> > > [1] https://lore.kernel.org/lkml/20220111232309.1786347-1-surenb@google.com/
> > >
> > > Thanks,
> > > Suren.
> > >
> > > >
> > > > Hillf
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ