linux-kernel - Re: [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20180716164032.94e13f765c5f33c6022eca38@linux-foundation.org>
Date:   Mon, 16 Jul 2018 16:40:32 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Dave Chinner <david@...morbit.com>,
        James Bottomley <James.Bottomley@...senPartnership.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Waiman Long <longman@...hat.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Jonathan Corbet <corbet@....net>,
        "Luis R. Rodriguez" <mcgrof@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        Jan Kara <jack@...e.cz>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        Ingo Molnar <mingo@...nel.org>,
        Miklos Szeredi <mszeredi@...hat.com>,
        Larry Woodman <lwoodman@...hat.com>,
        "Wangkai (Kevin,C)" <wangkai86@...wei.com>
Subject: Re: [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries

On Mon, 16 Jul 2018 05:41:15 -0700 Matthew Wilcox <willy@...radead.org> wrote:

> On Mon, Jul 16, 2018 at 11:09:01AM +0200, Michal Hocko wrote:
> > On Fri 13-07-18 10:36:14, Dave Chinner wrote:
> > [...]
> > > By limiting the number of negative dentries in this case, internal
> > > slab fragmentation is reduced such that reclaim cost never gets out
> > > of control. While it appears to "fix" the symptoms, it doesn't
> > > address the underlying problem. It is a partial solution at best but
> > > at worst it's another opaque knob that nobody knows how or when to
> > > tune.
> > 
> > Would it help to put all the negative dentries into its own slab cache?
> 
> Maybe the dcache should be more sensitive to its own needs.  In __d_alloc,
> it could check whether there are a high proportion of negative dentries
> and start recycling some existing negative dentries.

Well, yes.

The proposed patchset adds all this background reclaiming.  Problem is
a) that background reclaiming sometimes can't keep up so a synchronous
direct-reclaim was added on top and b) reclaiming dentries in the
background will cause non-dentry-allocating tasks to suffer because of
activity from the dentry-allocating tasks, which is inappropriate.

I expect a better design is something like

__d_alloc()
{
	...
	while (too many dentries)
		call the dcache shrinker
	...
}

and that's it.  This way we have a hard upper limit and only the tasks
which are creating dentries suffer the cost.

Regarding the slab page fragmentation issue: I'm wondering if the whole
idea of balancing the slab scan rates against the page scan rates isn't
really working out.  Maybe shrink_slab() should be sitting there
hammering the caches until they have freed up a particular number of
pages.  Quite a big change, conceptually and implementationally.

Aside: about a billion years ago we were having issues with processes
getting stuck in direct reclaim because other processes were coming in
and stealing away the pages which the direct-reclaimer had just freed. 
One possible solution to that was to make direct-reclaiming tasks
release the freed pages into a list on the task_struct.  So those pages
were invisible to other allocating tasks and were available to the
direct-reclaimer when it returned from the reclaim effort.  I forget
what happened to this.

It's quite a small code change and would provide a mechanism for
implementing the hammer-cache-until-youve-freed-enough design above.

Aside 2: if we *do* do something like the above __d_alloc() pseudo code
then perhaps it could be cast in terms of pages, not dentries.  ie,

__d_alloc()
{
	...
	while (too many pages in dentry_cache)
		call the dcache shrinker
	...
}

and, apart from the external name thing (grr), that should address
these fragmentation issues, no?  I assume it's easy to ask slab how
many pages are presently in use for a particular cache.