lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHc6FU5XMfTOBW1a0OnMaPSmgWECvO-KYAo8HeVNsCyd-HyV3g@mail.gmail.com>
Date:   Fri, 2 Jul 2021 02:20:20 +0200
From:   Andreas Gruenbacher <agruenba@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        cluster-devel <cluster-devel@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Jan Kara <jack@...e.cz>, Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH] gfs2: Fix mmap + page fault deadlocks

On Thu, Jul 1, 2021 at 11:41 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Thu, Jul 1, 2021 at 1:43 PM Andreas Gruenbacher <agruenba@...hat.com> wrote:
> > here's another attempt at fixing the mmap + page fault deadlocks we're
> > seeing on gfs2.  Still not ideal because get_user_pages_fast ignores the
> > current->pagefault_disabled flag
>
> Of course get_user_pages_fast() ignores the pagefault_disabled flag,
> because it doesn't do any page faults.
>
> If you don't want to fall back to the "maybe do IO" case, you should
> use the FOLL_FAST_ONLY flag - or get_user_pages_fast_only(), which
> does that itself.
>
> > For getting get_user_pages_fast changed to fix this properly, I'd need
> > help from the memory management folks.
>
> I really don't think you need anything at all from the mm people,
> because we already support that whole "fast only" case.

Yes, fair enough.

> Also, I have to say that I think the direct-IO code is fundamentally
> mis-designed. Why it is doing the page lookup _during_ the IO is a
> complete mystery to me. Why wasn't that done ahead of time before the
> filesystem took the locks it needed?

That would be inconvenient for reads, when the number of bytes read is
much smaller than the buffer size and we won't need to page in the
entire buffer.

> So what the direct-IO code _should_ do is to turn an ITER_IOVEC into a
> ITER_KVEC by doing the page lookup ahead of time, and none of these
> issues should even exist, and then the whole pagefault_disabled and/or
> FOLL_FAST_ONLY would be a complete non-issue.
>
> Is there any reason why that isn't what it does (other than historical baggage)?

It turns out that there's an even deeper issue with keeping references
to user-space pages. Those references will essentially pin the glock
of the associated inode to the node. Moving a glock off a node
requires truncating the inode's page cache, but the page references
would prevent that. So we'd only end up with different kinds of
potential deadlocks.

If we could get iomap_dio_rw to use "fast only" mode when requested,
we could fault in the pages without keeping references, try the IO,
and repeat when necessary.

Thanks a lot,
Adreas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ