linux-ext4 - Re: possible deadlock in start_this

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210211132533.GI308988@casper.infradead.org>
Date:   Thu, 11 Feb 2021 13:25:33 +0000
From:   Matthew Wilcox <willy@...radead.org>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Jan Kara <jack@...e.cz>, Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+bfdded10ab7dcd7507ae@...kaller.appspotmail.com>,
        Jan Kara <jack@...e.com>, linux-ext4@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Theodore Ts'o <tytso@....edu>, Linux-MM <linux-mm@...ck.org>
Subject: Re: possible deadlock in start_this_handle (2)

On Thu, Feb 11, 2021 at 02:07:03PM +0100, Michal Hocko wrote:
> On Thu 11-02-21 12:57:17, Matthew Wilcox wrote:
> > > current->flags should be always manipulated from the user context. But
> > > who knows maybe there is a bug and some interrupt handler is calling it.
> > > This should be easy to catch no?
> > 
> > Why would it matter if it were?
> 
> I was thinking about a clobbered state because updates to ->flags are
> not atomic because this shouldn't ever be updated concurrently. So maybe
> a racing interrupt could corrupt the flags state?

I don't think that's possible.  Same-CPU races between interrupt and
process context are simpler because the CPU always observes its own writes
in order and the interrupt handler completes "between" two instructions.

eg a load-store CPU will do:

load 0 from address A
or 8 with result
store 8 to A

Two CPUs can do:

CPU 0			CPU 1
load 0 from A
			load 0 from A
or 8 with 0
			or 4 with 0
store 8 to A
			store 4 to A

and the store of 8 is lost.

process			interrupt
load 0 from A
			load 0 from A
			or 4 with 0
			store 4 to A
or 8 with 0
store 8 to A

so the store of 4 would be lost.

but we expect the interrupt handler to restore it.  so we actually have this:

load 0 from A
			load 0 from A
			or 4 with 0
			store 4 to A
			load 4 from A
			clear 4 from 4
			store 0 to A
or 8 with 0
store 8 to A


If we have a leak where someone forgets to restore the nofs, that might
cause this.  We could check for the allocation mask bits being clear at
syscall exit (scheduling with these flags set is obviously ok).