[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <045e06ce-ac19-6293-a62b-8ac937f753ac@suse.de>
Date: Fri, 17 Mar 2017 07:23:24 -0500
From: Goldwyn Rodrigues <rgoldwyn@...e.de>
To: Dave Chinner <david@...morbit.com>,
Goldwyn Rodrigues <rgoldwyn@...e.com>
Cc: linux-fsdevel@...r.kernel.org, jack@...e.com, hch@...radead.org,
linux-block@...r.kernel.org, linux-btrfs@...r.kernel.org,
linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org,
sagi@...mberg.me, avi@...lladb.com, axboe@...nel.dk,
linux-api@...r.kernel.org, willy@...radead.org
Subject: Re: [PATCH 5/8] nowait aio: return on congested block device
On 03/16/2017 04:31 PM, Dave Chinner wrote:
> On Wed, Mar 15, 2017 at 04:51:04PM -0500, Goldwyn Rodrigues wrote:
>> From: Goldwyn Rodrigues <rgoldwyn@...e.com>
>>
>> A new flag BIO_NOWAIT is introduced to identify bio's
>> orignating from iocb with IOCB_NOWAIT. This flag indicates
>> to return immediately if a request cannot be made instead
>> of retrying.
>
> So this makes a congested block device run the bio IO completion
> callback with an -EAGAIN error present? Are all the filesystem
> direct IO submission and completion routines OK with that? i.e. does
> such a congestion case cause filesystems to temporarily expose stale
> data to unprivileged users when the IO is requeued in this way?
>
> e.g. filesystem does allocation without blocking, submits bio,
> device is congested, runs IO completion with error, so nothing
> written to allocated blocks, write gets queued, so other read
> comes in while the write is queued, reads data from uninitialised
> blocks that were allocated during the write....
>
> Seems kinda problematic to me to have a undocumented design
> constraint (i.e a landmine) where we submit the AIO only to have it
> error out and then expect the filesystem to do something special and
> different /without blocking/ on EAGAIN.
If the filesystems has to perform block allocation, we would return
-EAGAIN early enough. However, I agree there is a problem, since not all
filesystems know this. I worked on only three of them.
>
> Why isn't the congestion check at a higher layer like we do for page
> cache readahead? i.e. using the bdi*congested() API at the time we
> are doing all the other filesystem blocking checks.
>
Yes, that may work better. We will have to call bdi_read_congested() on
a write path. (will have to comment that part of the code). Would it
encompass all possible waits in the block layer?
--
Goldwyn
Powered by blists - more mailing lists