lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2jrcw4mtwcophanqmi2y74ffgf247m6ap44u3gedpylsjl3bz6@yueuwkmcwm66>
Date: Thu, 3 Apr 2025 15:43:12 +0200
From: Jan Kara <jack@...e.cz>
To: Luis Chamberlain <mcgrof@...nel.org>
Cc: Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>, 
	brauner@...nel.org, tytso@....edu, adilger.kernel@...ger.ca, 
	linux-ext4@...r.kernel.org, riel@...riel.com, hannes@...xchg.org, oliver.sang@...el.com, 
	david@...hat.com, axboe@...nel.dk, hare@...e.de, david@...morbit.com, 
	djwong@...nel.org, ritesh.list@...il.com, linux-fsdevel@...r.kernel.org, 
	linux-block@...r.kernel.org, linux-mm@...ck.org, gost.dev@...sung.com, p.raghav@...sung.com, 
	da.gomez@...sung.com
Subject: Re: [PATCH 2/3] fs/buffer: avoid races with folio migrations on
 __find_get_block_slow()

On Wed 02-04-25 19:04:23, Luis Chamberlain wrote:
> On Wed, Apr 02, 2025 at 02:58:28AM +0100, Matthew Wilcox wrote:
> > On Tue, Apr 01, 2025 at 02:49:51PM -0700, Davidlohr Bueso wrote:
> > > So the below could be tucked in for norefs only (because this is about the addr
> > > space i_private_lock), but this also shortens the hold time; if that matters
> > > at all, of course, vs changing the migration semantics.
> > 
> > I like this approach a lot better.  One wrinkle is that it doesn't seem
> > that we need to set the BH_Migrate bit on every buffer; we could define
> > that it's only set on the head BH, right?
> 
> Yes, we are also only doing this for block devices, and for migration
> purposes. Even though a bit from one buffer may be desirable it makes
> no sense to allow for that in case migration is taking place.  So indeed
> we have no need to add the flag for all buffers.
> 
> I think the remaining question is what users of __find_get_block_slow()
> can really block, and well I've started trying to determine that with
> coccinelle [0], its gonna take some more time.
> 
> Perhaps its easier to ask, why would a block device mapping want to
> allow __find_get_block_slow() to not block?

So I've audited all callers of __find_get_block_slow() (there aren't that
many) and most of them are actually fine with sleeping these days.

Analysis:
__find_get_block_slow() is only used from __find_get_block().

__find_get_block() is used from:
  write_boundary_block() - locks the buffer so can sleep
  bdev_getblk() - allocates buffers with 'gfp' mask. We use GFP_NOWAIT mask
     from some places (generally doing readahead). For callers where
     !gfpflags_allow_blocking() we should bail rather than block on
     migration. Callers are currently fine with this, we should probably
     document that bdev_getblk() with restrictive gfp mask may fail even if
     bh is present - or perhaps make this even more explicit in the API by
     providing bdev_try_getblk() and make bdev_getblk() assert gfp mask
     allows sleeping.
  __getblk_slow() - only called from bdev_getblk(). Probably should fold
     there.
  ocfs2_force_read_journal() - allows sleeping as it does IO
  jbd2_journal_revoke() - can sleep (has might_sleep() in the beginning)
  jbd2_journal_cancel_revoke() - only used from do_get_write_access() and
    do_get_create_access() which do sleep. So can sleep.
  jbd2_clear_buffer_revoked_flags() - only called from journal commit code
    which sleeps. So can sleep.

The last user is sb_find_get_block() which is used from:
  hpfs_prefetch_sectors() - prefers bail rather than blocking
  fat_dir_readahead() - prefers bail rather than blocking
  exfat_dir_readahead() - prefers bail rather than blocking
  ext4_free_blocks() - can sleep
  ext4_getblk() - depending on EXT4_GET_BLOCKS_CACHED_NOWAIT flag either can
    sleep or must bail (and is fine with it) rather than sleeping
  fs/ext4/ialloc.c:recently_deleted() - this one is the most problematic
    place. It must bail rather than sleeping (called under a spinlock) but
    it depends on the fact that if bh is not returned, then the data has been
    written out and evicted from memory. Luckily, the usage of
    recently_deleted() is mostly an optimization to reduce damage in case
    of crash so rare false failure should be OK. Ted, what is your opinion?

And this is actually all. So it seems that if we give possibility to
callers to tell whether they want to bail or wait for migration, things
should work out fine.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ