linux-ext4 - Re: possible deadlock in start_this

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YCVeLF8aZGfRVY3C@dhcp22.suse.cz>
Date:   Thu, 11 Feb 2021 17:41:16 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Jan Kara <jack@...e.cz>, Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+bfdded10ab7dcd7507ae@...kaller.appspotmail.com>,
        Jan Kara <jack@...e.com>, linux-ext4@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Theodore Ts'o <tytso@....edu>, Linux-MM <linux-mm@...ck.org>
Subject: Re: possible deadlock in start_this_handle (2)

On Thu 11-02-21 14:26:30, Matthew Wilcox wrote:
> On Thu, Feb 11, 2021 at 03:20:41PM +0100, Michal Hocko wrote:
> > On Thu 11-02-21 13:25:33, Matthew Wilcox wrote:
> > > On Thu, Feb 11, 2021 at 02:07:03PM +0100, Michal Hocko wrote:
> > > > On Thu 11-02-21 12:57:17, Matthew Wilcox wrote:
> > > > > > current->flags should be always manipulated from the user context. But
> > > > > > who knows maybe there is a bug and some interrupt handler is calling it.
> > > > > > This should be easy to catch no?
> > > > > 
> > > > > Why would it matter if it were?
> > > > 
> > > > I was thinking about a clobbered state because updates to ->flags are
> > > > not atomic because this shouldn't ever be updated concurrently. So maybe
> > > > a racing interrupt could corrupt the flags state?
> > > 
> > > I don't think that's possible.  Same-CPU races between interrupt and
> > > process context are simpler because the CPU always observes its own writes
> > > in order and the interrupt handler completes "between" two instructions.
> > 
> > I have to confess I haven't really thought the scenario through. My idea
> > was to simply add a simple check for an irq context into ->flags setting
> > routine because this should never be done in the first place. Not only
> > for scope gfp flags but any other PF_ flags IIRC.
> 
> That's not automatically clear to me.  There are plenty of places
> where an interrupt borrows the context of the task that it happens to
> have interrupted.  Specifically, interrupts should be using GFP_ATOMIC
> anyway, so this doesn't really make a lot of sense, but I don't think
> it's necessarily wrong for an interrupt to call a function that says
> "Definitely don't make GFP_FS allocations between these two points".

Not sure I got your point. IRQ context never does reclaim so anything
outside of NOWAIT/ATOMIC is pointless. But you might be refering to a
future code where GFP_FS might have a meaning outside of the reclaim
context?

Anyway if we are to allow modifying PF_ flags from an interrupt contenxt
then I believe we should make that code IRQ aware at least. I do not
feel really comfortable about async modifications when this is stated to
be safe doing in a non atomic way.

But I suspect we have drifted away from the original issue. I thought
that a simple check would help us narrow down this particular case and
somebody messing up from the IRQ context didn't sound like a completely
off.

-- 
Michal Hocko
SUSE Labs