linux-ext4 - Re: [PATCH 1/9] Remove inode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-id: <164340576289.5493.5784848964540459557@noble.neil.brown.name>
Date:   Sat, 29 Jan 2022 08:36:02 +1100
From:   "NeilBrown" <neilb@...e.de>
To:     "Miklos Szeredi" <miklos@...redi.hu>
Cc:     "Andrew Morton" <akpm@...ux-foundation.org>,
        "Jaegeuk Kim" <jaegeuk@...nel.org>, "Chao Yu" <chao@...nel.org>,
        "Jeff Layton" <jlayton@...nel.org>,
        "Ilya Dryomov" <idryomov@...il.com>,
        "Trond Myklebust" <trond.myklebust@...merspace.com>,
        "Anna Schumaker" <anna.schumaker@...app.com>,
        "Ryusuke Konishi" <konishi.ryusuke@...il.com>,
        "Darrick J. Wong" <djwong@...nel.org>,
        "Philipp Reisner" <philipp.reisner@...bit.com>,
        "Lars Ellenberg" <lars.ellenberg@...bit.com>,
        "Paolo Valente" <paolo.valente@...aro.org>,
        "Jens Axboe" <axboe@...nel.dk>, "linux-mm" <linux-mm@...ck.org>,
        linux-nilfs@...r.kernel.org,
        "Linux NFS list" <linux-nfs@...r.kernel.org>,
        linux-fsdevel@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net,
        "Ext4" <linux-ext4@...r.kernel.org>, ceph-devel@...r.kernel.org,
        drbd-dev@...ts.linbit.com, linux-kernel@...r.kernel.org,
        linux-block@...r.kernel.org
Subject: Re: [PATCH 1/9] Remove inode_congested()

On Fri, 28 Jan 2022, Miklos Szeredi wrote:
> On Thu, 27 Jan 2022 at 03:47, NeilBrown <neilb@...e.de> wrote:
> >
> > inode_congested() reports if the backing-device for the inode is
> > congested.  Few bdi report congestion any more, only ceph, fuse, and
> > nfs.  Having support just for those is unlikely to be useful.
> >
> > The places which test inode_congested() or it variants like
> > inode_write_congested(), avoid initiating IO if congestion is present.
> > We now have to rely on other places in the stack to back off, or abort
> > requests - we already do for everything except these 3 filesystems.
> >
> > So remove inode_congested() and related functions, and remove the call
> > sites, assuming that inode_congested() always returns 'false'.
> 
> Looks to me this is going to "break" fuse; e.g. readahead path will go
> ahead and try to submit more requests, even if the queue is getting
> congested.   In this case the readahead submission will eventually
> block, which is counterproductive.
> 
> I think we should *first* make sure all call sites are substituted
> with appropriate mechanisms in the affected filesystems and as a last
> step remove the superfluous bdi congestion mechanism.
> 
> You are saying that all fs except these three already have such
> mechanisms in place, right?  Can you elaborate on that?

Not much.  I haven't looked into how other filesystems cope, I just know
that they must because no other filesystem ever has a congested bdi
(with one or two minor exceptions, like filesystems over drbd).

Surely read-ahead should never block.  If it hits congestion, the
read-ahead request should simply fail.  block-based filesystems seem to
set REQ_RAHEAD which might get mapped to REQ_FAILFAST_MASK, though I
don't know how that is ultimately used.

Maybe fuse and others should continue to track 'congestion' and reject
read-ahead requests when congested.
Maybe also skip WB_SYNC_NONE writes..

Or maybe this doesn't really matter in practice...  I wonder if we can
measure the usefulness of congestion.

Thanks,
NeilBrown