[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <02209ec3-62b4-595f-b84e-2cd8838ac41b@toxicpanda.com>
Date: Fri, 20 Mar 2020 10:23:43 -0400
From: Josef Bacik <josef@...icpanda.com>
To: Christoph Hellwig <hch@...radead.org>,
Goldwyn Rodrigues <rgoldwyn@...e.de>
Cc: linux-fsdevel@...r.kernel.org, riteshh@...ux.ibm.com,
linux-ext4@...r.kernel.org, darrick.wong@...cle.com,
willy@...radead.org, linux-btrfs@...r.kernel.org
Subject: Re: [PATCH v2] iomap: return partial I/O count on error in
iomap_dio_bio_actor
On 3/20/20 10:05 AM, Christoph Hellwig wrote:
> I spent a fair amount of time looking over this change, and I am
> starting to feel very bad about it. iomap_apply() has pretty clear
> semantics of either return an error, or return the bytes processed,
> and in general these semantics work just fine.
>
> The thing that breaks this concept is the btrfs submit_bio hook,
> which allows the file system to keep state for each bio actually
> submitted. But I think you can simply keep the length internally
> in btrfs - use the space in iomap->private as a counter of how
> much was allocated, pass the iomap to the submit_io hook, and
> update it there, and then deal with the rest in ->iomap_end.
>
> That assumes ->iomap_end actually is the right place - can someone
> explain what the expected call site for __endio_write_update_ordered
> is? It kinda sorta looks to me like something that would want to
> be called after I/O completion, not after I/O submission, but maybe
> I misunderstand the code.
>
I'm not sure what you're looking at specifically wrt error handling, but I can
explain __endio_write_update_ordered.
Btrfs has ordered extents to keep track of an extent that currently has IO being
done on it. Generally that IO takes multiple bio's, so we keep track of the
outstanding size of the IO being done, and each bio completes and thus removes
its size from the pending size. If any one of those bios has an error we need
to make sure we discard the whole ordered extent, as part of it won't be valid.
Just a cursory look at the current code I assume that's what's confusing you, we
call this when we have an error in the O_DIRECT code. This is just so we get
the proper cleanup for the ordered extent. People will wait on the ordered
extent to be completed, so if we've started an ordered extent and aren't able to
complete the range we need to do __endio_write_update_ordered() so that the
ordered extent is finished and we wakeup any waiters.
Does this help? If I need to I can context switch into whatever you're looking
at, but I'm going to avoid looking and hope I can just shout useful information
in your direction ;). Thanks,
Josef
Powered by blists - more mailing lists