lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 23 May 2023 12:35:35 -0400
From:   Kent Overstreet <kent.overstreet@...ux.dev>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     Jan Kara <jack@...e.cz>, cluster-devel@...hat.com,
        "Darrick J . Wong" <djwong@...nel.org>,
        linux-kernel@...r.kernel.org, dhowells@...hat.com,
        linux-bcachefs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        Kent Overstreet <kent.overstreet@...il.com>
Subject: Re: [Cluster-devel] [PATCH 06/32] sched: Add
 task_struct->faults_disabled_mapping

On Tue, May 23, 2023 at 09:21:56AM -0700, Christoph Hellwig wrote:
> On Tue, May 23, 2023 at 03:34:31PM +0200, Jan Kara wrote:
> > I've checked the code and AFAICT it is all indeed handled. BTW, I've now
> > remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25
> > ("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different
> > way (by prefaulting pages from the iter before grabbing the problematic
> > lock and then disabling page faults for the iomap_dio_rw() call). I guess
> > we should somehow unify these schemes so that we don't have two mechanisms
> > for avoiding exactly the same deadlock. Adding GFS2 guys to CC.
> > 
> > Also good that you've written a fstest for this, that is definitely a useful
> > addition, although I suspect GFS2 guys added a test for this not so long
> > ago when testing their stuff. Maybe they have a pointer handy?
> 
> generic/708 is the btrfs version of this.
> 
> But I think all of the file systems that have this deadlock are actually
> fundamentally broken because they have a mess up locking hierarchy
> where page faults take the same lock that is held over the the direct I/
> operation.  And the right thing is to fix this.  I have work in progress
> for btrfs, and something similar should apply to gfs2, with the added
> complication that it probably means a revision to their network
> protocol.

No, this is fundamentally because userspace controls the ordering of
locking because the buffer passed to dio can point into any address
space. You can't solve this by changing the locking heirarchy.

If you want to be able to have locking around adding things to the
pagecache so that things that bypass the pagecache can prevent
inconsistencies (and we do, the big one is fcollapse), and if you want
dio to be able to use that same locking (because otherwise dio will also
cause page cache inconsistency), this is the way to do it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ