[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2jrcw4mtwcophanqmi2y74ffgf247m6ap44u3gedpylsjl3bz6@yueuwkmcwm66>
Date: Thu, 3 Apr 2025 15:43:12 +0200
From: Jan Kara <jack@...e.cz>
To: Luis Chamberlain <mcgrof@...nel.org>
Cc: Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
brauner@...nel.org, tytso@....edu, adilger.kernel@...ger.ca,
linux-ext4@...r.kernel.org, riel@...riel.com, hannes@...xchg.org, oliver.sang@...el.com,
david@...hat.com, axboe@...nel.dk, hare@...e.de, david@...morbit.com,
djwong@...nel.org, ritesh.list@...il.com, linux-fsdevel@...r.kernel.org,
linux-block@...r.kernel.org, linux-mm@...ck.org, gost.dev@...sung.com, p.raghav@...sung.com,
da.gomez@...sung.com
Subject: Re: [PATCH 2/3] fs/buffer: avoid races with folio migrations on
__find_get_block_slow()
On Wed 02-04-25 19:04:23, Luis Chamberlain wrote:
> On Wed, Apr 02, 2025 at 02:58:28AM +0100, Matthew Wilcox wrote:
> > On Tue, Apr 01, 2025 at 02:49:51PM -0700, Davidlohr Bueso wrote:
> > > So the below could be tucked in for norefs only (because this is about the addr
> > > space i_private_lock), but this also shortens the hold time; if that matters
> > > at all, of course, vs changing the migration semantics.
> >
> > I like this approach a lot better. One wrinkle is that it doesn't seem
> > that we need to set the BH_Migrate bit on every buffer; we could define
> > that it's only set on the head BH, right?
>
> Yes, we are also only doing this for block devices, and for migration
> purposes. Even though a bit from one buffer may be desirable it makes
> no sense to allow for that in case migration is taking place. So indeed
> we have no need to add the flag for all buffers.
>
> I think the remaining question is what users of __find_get_block_slow()
> can really block, and well I've started trying to determine that with
> coccinelle [0], its gonna take some more time.
>
> Perhaps its easier to ask, why would a block device mapping want to
> allow __find_get_block_slow() to not block?
So I've audited all callers of __find_get_block_slow() (there aren't that
many) and most of them are actually fine with sleeping these days.
Analysis:
__find_get_block_slow() is only used from __find_get_block().
__find_get_block() is used from:
write_boundary_block() - locks the buffer so can sleep
bdev_getblk() - allocates buffers with 'gfp' mask. We use GFP_NOWAIT mask
from some places (generally doing readahead). For callers where
!gfpflags_allow_blocking() we should bail rather than block on
migration. Callers are currently fine with this, we should probably
document that bdev_getblk() with restrictive gfp mask may fail even if
bh is present - or perhaps make this even more explicit in the API by
providing bdev_try_getblk() and make bdev_getblk() assert gfp mask
allows sleeping.
__getblk_slow() - only called from bdev_getblk(). Probably should fold
there.
ocfs2_force_read_journal() - allows sleeping as it does IO
jbd2_journal_revoke() - can sleep (has might_sleep() in the beginning)
jbd2_journal_cancel_revoke() - only used from do_get_write_access() and
do_get_create_access() which do sleep. So can sleep.
jbd2_clear_buffer_revoked_flags() - only called from journal commit code
which sleeps. So can sleep.
The last user is sb_find_get_block() which is used from:
hpfs_prefetch_sectors() - prefers bail rather than blocking
fat_dir_readahead() - prefers bail rather than blocking
exfat_dir_readahead() - prefers bail rather than blocking
ext4_free_blocks() - can sleep
ext4_getblk() - depending on EXT4_GET_BLOCKS_CACHED_NOWAIT flag either can
sleep or must bail (and is fine with it) rather than sleeping
fs/ext4/ialloc.c:recently_deleted() - this one is the most problematic
place. It must bail rather than sleeping (called under a spinlock) but
it depends on the fact that if bh is not returned, then the data has been
written out and evicted from memory. Luckily, the usage of
recently_deleted() is mostly an optimization to reduce damage in case
of crash so rare false failure should be OK. Ted, what is your opinion?
And this is actually all. So it seems that if we give possibility to
callers to tell whether they want to bail or wait for migration, things
should work out fine.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists