linux-kernel - Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <1506902079.5691.256.camel@linux.vnet.ibm.com>
Date:   Sun, 01 Oct 2017 19:54:39 -0400
From:   Mimi Zohar <zohar@...ux.vnet.ibm.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Dave Chinner <david@...morbit.com>,
        LSM List <linux-security-module@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Christoph Hellwig <hch@...radead.org>,
        "Theodore Ts'o" <tytso@....edu>, Jan Kara <jack@...e.cz>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-integrity@...r.kernel.org
Subject: Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been
 taken exclusively

On Sun, 2017-10-01 at 15:20 -0700, Linus Torvalds wrote:
> On Sun, Oct 1, 2017 at 3:06 PM, Eric W. Biederman <ebiederm@...ssion.com> wrote:
> >
> > Unless I misread something it was being pointed out there are some vfs
> > operations today on which ima writes an ima xattr as a side effect.  And
> > those operations hold the i_sem.  So perhaps I am misunderstanding
> > things or writing the ima xattr needs to happen at some point.  Which
> > implies something like queued work.
> 
> So the issue is indeed the inode semaphore, as it is used by IMA. But
> all these IMA patches to work around the issue are just horribly ugly.
> One adds a VFS-layer filesystem method that most filesystems end up
> not really needing (it's the same as the regular read), and other
> filesystems end up then having hacks with ("oh, I don't need to take
> this lock because it was already taken by the caller").
> 
> The second patch attempt avoided the need for a new filesystem method,
> but added a flag in an annoying place (for the same basic logic). The
> advantage is that now most filesystems don't actually need to care any
> more (and the filesystems that used to care now check that flag).
> 
> There was discussion about moving the flag to a mode convenient spot,
> which would have made it a lot less intrusive.
> 
> But the basic issue is that almost always when you see lock
> inversions, the problem can just be fixed by doing the locking
> differently instead.

This is what I've been missing.  Thank you for taking the time to
understand the problem and explain how!

> And that's what I was/am pushing for.

> There really are two totally different issues:
> 
>  - the integrity _measurement_.
> 
>    This one wants to be serialized, so that you don't have multiple
> concurrent measurements, and the serialization fundamentally has to be
> around all the IO, so this lock pretty much has to be outside the
> i_sem.
> 
>  - the integrity invalidation on certain operations.
> 
>    This one fundamentally had to be inside the i_sem, since some of
> the operations that cause this end up already holding the i_sem at a
> VFS layer.
> 
> so you had these two different requirements (inside _and_ outside),
> and the IMA approach was basically to avoid the problem by making
> i_sem *the* lock, and then making the IO routines aware of it already
> being held. That does solve the inside/outside issue.
> 
> But the simpler way to fix it is to simply use two locks that nest
> inside each other, with i_sem nesting in the middle.  That just avoids
> the problem entirely, and doesn't require anybody to ever care about
> i_sem semantic changes, because i_sem semantics simply didn't change
> at all.
> 
> So that's the approach I'm pushing. I admittedly haven't actually
> looked at the IMA details, but from a high-level standpoint you can
> basically describe it (as above) without having to care too much about
> exactly what IMA even wants.
> 
> The two-lock approach does require that the operations that invalidate
> the integrity measurements always only invalidate it, and don't try to
> re-compute it. But I suspect that would be entirely insane anyway
> (imagine a world where "setxattr" would have to read the whole file
> contents in order to revalidate the integrity measurement - even if
> there is nobody who even *cares*).

Right, the setxattr, chmod, chown syscalls just resets the cached
flags, which indicate whether the file needs to be re-measured, re-
validated, or re-audited.  The file hash is not re-calculated at this
point.  That happens on the next access (in policy).

Mimi