linux-ext4 - Re: [PATCH 5/8] nowait aio: return on congested block device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <045e06ce-ac19-6293-a62b-8ac937f753ac@suse.de>
Date:   Fri, 17 Mar 2017 07:23:24 -0500
From:   Goldwyn Rodrigues <rgoldwyn@...e.de>
To:     Dave Chinner <david@...morbit.com>,
        Goldwyn Rodrigues <rgoldwyn@...e.com>
Cc:     linux-fsdevel@...r.kernel.org, jack@...e.com, hch@...radead.org,
        linux-block@...r.kernel.org, linux-btrfs@...r.kernel.org,
        linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org,
        sagi@...mberg.me, avi@...lladb.com, axboe@...nel.dk,
        linux-api@...r.kernel.org, willy@...radead.org
Subject: Re: [PATCH 5/8] nowait aio: return on congested block device



On 03/16/2017 04:31 PM, Dave Chinner wrote:
> On Wed, Mar 15, 2017 at 04:51:04PM -0500, Goldwyn Rodrigues wrote:
>> From: Goldwyn Rodrigues <rgoldwyn@...e.com>
>>
>> A new flag BIO_NOWAIT is introduced to identify bio's
>> orignating from iocb with IOCB_NOWAIT. This flag indicates
>> to return immediately if a request cannot be made instead
>> of retrying.
> 
> So this makes a congested block device run the bio IO completion
> callback with an -EAGAIN error present? Are all the filesystem
> direct IO submission and completion routines OK with that? i.e. does
> such a congestion case cause filesystems to temporarily expose stale
> data to unprivileged users when the IO is requeued in this way?
> 
> e.g. filesystem does allocation without blocking, submits bio,
> device is congested, runs IO completion with error, so nothing
> written to allocated blocks, write gets queued, so other read
> comes in while the write is queued, reads data from uninitialised
> blocks that were allocated during the write....
> 
> Seems kinda problematic to me to have a undocumented design
> constraint (i.e a landmine) where we submit the AIO only to have it
> error out and then expect the filesystem to do something special and
> different /without blocking/ on EAGAIN.


If the filesystems has to perform block allocation, we would return
-EAGAIN early enough. However, I agree there is a problem, since not all
filesystems know this. I worked on only three of them.

> 
> Why isn't the congestion check at a higher layer like we do for page
> cache readahead? i.e. using the bdi*congested() API at the time we
> are doing all the other filesystem blocking checks.
> 

Yes, that may work better. We will have to call bdi_read_congested() on
a write path. (will have to comment that part of the code). Would it
encompass all possible waits in the block layer?


-- 
Goldwyn