linux-ext4 - Re: Do we need to unrevert "fs: do not prefault sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wjLM_W6W-gk7EJ69Yaoq54x_zj+BJs3Xxt5QRPSDaKCKg@mail.gmail.com>
Date:   Tue, 22 Jun 2021 08:53:05 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     David Howells <dhowells@...hat.com>
Cc:     "Ted Ts'o" <tytso@....edu>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Linux-MM <linux-mm@...ck.org>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Do we need to unrevert "fs: do not prefault sys_write() user
 buffer pages"?

On Tue, Jun 22, 2021 at 8:32 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> But yes, it could get unmapped again before the actual copy happens
> with the lock held. But that's why the copy is using that atomic
> version, so if that happens, we'll end up repeating.

Side note: search for "iov_iter_fault_in_writeable()" on lkml for a
gfs2 patch-series that is buggy, exactly because it does *not* use the
atomic user space accesses, and just tries to do the fault-in to hide
the real bug.

So you are correct that the fault-in is something people need to be
very wary of. Without the atomic side of the access, it's pure voodoo
programming.

You have two choices:

 - don't hold any filesystem locks (*) over a user space access

 - do the user space access with the atomic versions and repeat (with
pre-faulting to make the repeat work)

There's one special case of that "no filesystem locks" case that I put
that (*) for: you could do a read-recursive lock if the filesystem
page fault path can only ever take read locks. But none of our regular
locks are read-recursive apart from the very special case of the
spinning rwlock in interrupts (see comment in
queued_read_lock_slowpath()).

That special read-recursive model "works", but I would seriously
caution against it, simply because such locks can get very unfair very
quickly. So it's a DoS magnet. It's part of why none of the normal
locking models really have that (any more - rwlocks used to all be
that way).

                     Linus