linux-kernel - [RFC 0/9] gfs2: handle page faults during read and write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210531170123.243771-1-agruenba@redhat.com>
Date:   Mon, 31 May 2021 19:01:14 +0200
From:   Andreas Gruenbacher <agruenba@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Andreas Gruenbacher <agruenba@...hat.com>,
        cluster-devel@...hat.com, linux-kernel@...r.kernel.org,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Jan Kara <jack@...e.cz>, Matthew Wilcox <willy@...radead.org>
Subject: [RFC 0/9] gfs2: handle page faults during read and write

Hi Linus,

here's a set of fixes for how gfs2 handles page faults during read and
write syscalls.  The patch queue is ready for merging except for two
open issues.  May I ask you to shed some light or ask the right person
to help us out?

Right now, the filesystem can end up trying to take a lock it's already
holding, which causes it to BUG.  We can recognize and mostly deal with
that simple case, but more complex scenarios exist which involve
multiple locks and / or incompatible locking modes.  To handle those, we
switch to trylocks where necessary.

The patches appear to be working as itended, with the following
remaining questions:

(1) Jan Kara has pointed out [*] that returning VM_FAULT_SIGBUS from the
.fault / .page_mkwrite ops will raise SIGBUS, which would be visible to
user space.  This hasn't been observed in our testing; instead, accesses
to the mmapped memory via call chains like:

  iov_iter_fault_in_readable -> fault_in_pages_readable -> __get_user

would simply fail with -EFAULT, as we're expecting.

>From looking at do_user_addr_fault and do_sigbus in arch/x86/mm/fault.c,
my impression is that VM_FAULT_SIGBUS will not cause SIGBUS to be raised
in kernel mode, and that we can rely on the -EFAULT behavior for
triggering the retries at the filesystem level.

(2) The patch queue introduces the same kind of trylock behavior for
both .fault (gfs2_fault) and .page_mkwrite (gfs2_page_mkwrite).  I'm not
aware of any situation in which we can actually end up in .page_mkwrite
during a .read_iter or .write_iter operation, so the trylock behavior in
.page_mkwrite might be harmless but unnecessary.

Thank you very much,
Andreas

[*] https://listman.redhat.com/archives/cluster-devel/2021-May/msg00080.html

Previous posting of this patch queue:

https://listman.redhat.com/archives/cluster-devel/2021-May/msg00073.html

New xfstest for mmap + page faults during read / write:

https://lore.kernel.org/fstests/20210531152604.240462-1-agruenba@redhat.com/

Andreas Gruenbacher (9):
  gfs2: Clean up the error handling in gfs2_page_mkwrite
  gfs2: Add wrapper for iomap_file_buffered_write
  gfs2: Add gfs2_holder_is_compatible helper
  gfs2: Fix mmap + page fault deadlocks (part 1)
  iov_iter: Add iov_iter_fault_in_writeable()
  gfs2: Add wrappers for accessing journal_info
  gfs2: Encode glock holding and retry flags in journal_info
  gfs2: Add LM_FLAG_OUTER glock holder flag
  gfs2: Fix mmap + page fault deadlocks (part 2)

 fs/gfs2/aops.c      |   6 +-
 fs/gfs2/bmap.c      |  31 ++++----
 fs/gfs2/file.c      | 175 +++++++++++++++++++++++++++++++++-----------
 fs/gfs2/glock.c     |  12 +++
 fs/gfs2/glock.h     |  27 ++++++-
 fs/gfs2/incore.h    |  41 +++++++++++
 fs/gfs2/inode.c     |   2 +-
 fs/gfs2/log.c       |   4 +-
 fs/gfs2/lops.c      |   2 +-
 fs/gfs2/meta_io.c   |   6 +-
 fs/gfs2/super.c     |   2 +-
 fs/gfs2/trans.c     |  16 ++--
 include/linux/uio.h |   1 +
 lib/iov_iter.c      |  20 ++++-
 14 files changed, 265 insertions(+), 80 deletions(-)

-- 
2.26.3