lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 25 May 2023 10:47:31 +0200
From:   Jan Kara <jack@...e.cz>
To:     Kent Overstreet <kent.overstreet@...ux.dev>
Cc:     Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-bcachefs@...r.kernel.org,
        Kent Overstreet <kent.overstreet@...il.com>,
        "Darrick J . Wong" <djwong@...nel.org>, dhowells@...hat.com,
        Andreas Gruenbacher <agruenba@...hat.com>,
        cluster-devel@...hat.com, Bob Peterson <rpeterso@...hat.com>
Subject: Re: [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping

On Tue 23-05-23 12:49:06, Kent Overstreet wrote:
> > > No, that's definitely handled (and you can see it in the code I linked),
> > > and I wrote a torture test for fstests as well.
> > 
> > I've checked the code and AFAICT it is all indeed handled. BTW, I've now
> > remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25
> > ("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different
> > way (by prefaulting pages from the iter before grabbing the problematic
> > lock and then disabling page faults for the iomap_dio_rw() call). I guess
> > we should somehow unify these schemes so that we don't have two mechanisms
> > for avoiding exactly the same deadlock. Adding GFS2 guys to CC.
> 
> Oof, that sounds a bit sketchy. What happens if the dio call passes in
> an address from the same address space?

If we submit direct IO that uses mapped file F at offset O as a buffer for
direct IO from file F, offset O, it will currently livelock in an
indefinite retry loop. It should rather return error or fall back to
buffered IO. But that should be fixable. Andreas?

But if the buffer and direct IO range does not overlap, it will just
happily work - iomap_dio_rw() invalidates only the range direct IO is done
to.

> What happens if we race with the pages we faulted in being evicted?

We fault them in again and retry.

> > Also good that you've written a fstest for this, that is definitely a useful
> > addition, although I suspect GFS2 guys added a test for this not so long
> > ago when testing their stuff. Maybe they have a pointer handy?
> 
> More tests more good.
> 
> So if we want to lift this scheme to the VFS layer, we'd start by
> replacing the lock you added (grepping for it, the name escapes me) with
> a different type of lock - two_state_shared_lock in my code, it's like a
> rw lock except writers don't exclude other writers. That way the DIO
> path can use it without singlethreading writes to a single file.

Yes, I've noticed that you are introducing in bcachefs a lock with very
similar semantics to mapping->invalidate_lock, just with this special lock
type. What I'm kind of worried about with two_state_shared_lock as
implemented in bcachefs is the fairness. AFAICS so far if someone is e.g.
heavily faulting pages on a file, direct IO to that file can be starved
indefinitely. That is IMHO not a good thing and I would not like to use
this type of lock in VFS until this problem is resolved. But it should be
fixable e.g. by introducing some kind of deadline for a waiter after which
it will block acquisitions of the other lock state.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ