linux-kernel - Re: [PATCH v8 03/10] mm/lru: replace pgdat lru

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200416152830.GA195132@cmpxchg.org>
Date:   Thu, 16 Apr 2020 11:28:30 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Alex Shi <alex.shi@...ux.alibaba.com>
Cc:     akpm@...ux-foundation.org, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        mgorman@...hsingularity.net, tj@...nel.org, hughd@...gle.com,
        khlebnikov@...dex-team.ru, daniel.m.jordan@...cle.com,
        yang.shi@...ux.alibaba.com, willy@...radead.org,
        shakeelb@...gle.com, Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Roman Gushchin <guro@...com>,
        Chris Down <chris@...isdown.name>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vlastimil Babka <vbabka@...e.cz>, Qian Cai <cai@....pw>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Jérôme Glisse <jglisse@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        swkhack <swkhack@...il.com>,
        "Potyra, Stefan" <Stefan.Potyra@...ktrobit.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Colin Ian King <colin.king@...onical.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
        Peng Fan <peng.fan@....com>,
        Nikolay Borisov <nborisov@...e.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Yafang Shao <laoar.shao@...il.com>,
        Wei Yang <richard.weiyang@...il.com>
Subject: Re: [PATCH v8 03/10] mm/lru: replace pgdat lru_lock with lruvec lock

Hi Alex,

On Thu, Apr 16, 2020 at 04:01:20PM +0800, Alex Shi wrote:
> 
> 
> 在 2020/4/15 下午9:42, Alex Shi 写道:
> > Hi Johannes,
> > 
> > Thanks a lot for point out!
> > 
> > Charging in __read_swap_cache_async would ask for 3 layers function arguments
> > pass, that would be a bit ugly. Compare to this, could we move out the
> > lru_cache add after commit_charge, like ksm copied pages?
> > 
> > That give a bit extra non lru list time, but the page just only be used only
> > after add_anon_rmap setting. Could it cause troubles?
> 
> Hi Johannes & Andrew,
> 
> Doing lru_cache_add_anon during swapin_readahead can give a very short timing 
> for possible page reclaiming for these few pages.
> 
> If we delay these few pages lru adding till after the vm_fault target page 
> get memcg charging(mem_cgroup_commit_charge) and activate, we could skip the 
> mem_cgroup_try_charge/commit_charge/cancel_charge process in __read_swap_cache_async().
> But the cost is maximum SWAP_RA_ORDER_CEILING number pages on each cpu miss
> page reclaiming in a short time. On the other hand, save the target vm_fault
> page from reclaiming before activate it during that time.

The readahead pages surrounding the faulting page might never get
accessed and pile up to large amounts. Users can also trigger
non-faulting readahead with MADV_WILLNEED.

So unfortunately, I don't see a way to keep these pages off the
LRU. They do need to be reclaimable, or they become a DoS vector.

I'm currently preparing a small patch series to make swap ownership
tracking an integral part of memcg and change the swapin charging
sequence, then you don't have to worry about it. This will also
unblock Joonsoo's "workingset protection/detection on the anonymous
LRU list" patch series, since he is blocked on the same problem - he
needs the correct LRU available at swapin time to process refaults
correctly. Both of your patch series are already pretty large, they
shouldn't need to also deal with that.