linux-kernel - Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200212164235.GB180867@cmpxchg.org>
Date:   Wed, 12 Feb 2020 11:42:35 -0500
From:   Johannes Weiner <hannes@...xchg.org>
To:     Yafang Shao <laoar.shao@...il.com>
Cc:     linux-fsdevel@...r.kernel.org, Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Dave Chinner <david@...morbit.com>,
        Michal Hocko <mhocko@...e.com>, Roman Gushchin <guro@...com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker
 LRU

On Wed, Feb 12, 2020 at 08:25:45PM +0800, Yafang Shao wrote:
> On Wed, Feb 12, 2020 at 1:55 AM Johannes Weiner <hannes@...xchg.org> wrote:
> > Another variant of this problem was recently observed, where the
> > kernel violates cgroups' memory.low protection settings and reclaims
> > page cache way beyond the configured thresholds. It was followed by a
> > proposal of a modified form of the reverted commit above, that
> > implements memory.low-sensitive shrinker skipping over populated
> > inodes on the LRU [1]. However, this proposal continues to run the
> > risk of attracting disproportionate reclaim pressure to a pool of
> > still-used inodes,
> 
> Hi Johannes,
> 
> If you really think that is a risk, what about bellow additional patch
> to fix this risk ?
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 80dddbc..61862d9 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -760,7 +760,7 @@ static bool memcg_can_reclaim_inode(struct inode *inode,
>                 goto out;
> 
>         cgroup_size = mem_cgroup_size(memcg);
> -       if (inode->i_data.nrpages + protection >= cgroup_size)
> +       if (inode->i_data.nrpages)
>                 reclaimable = false;
> 
>  out:
> 
> With this additional patch, we skip all inodes in this memcg until all
> its page cache pages are reclaimed.

Well that's something we've tried and had to revert because it caused
issues in slab reclaim. See the History part of my changelog.

> > while not addressing the more generic reclaim
> > inversion problem outside of a very specific cgroup application.
> >
> 
> But I have a different understanding.  This method works like a
> knob. If you really care about your workingset (data), you should
> turn it on (i.e. by using memcg protection to protect them), while
> if you don't care about your workingset (data) then you'd better
> turn it off. That would be more flexible.  Regaring your case in the
> commit log, why not protect your linux git tree with memcg
> protection ?

I can't imagine a scenario where I *wouldn't* care about my
workingset, though. Why should it be opt-in, not the default?