linux-kernel - Re: [PATCH v3 6/9] mm/workingset: handle the page without memcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAAmzW4MBQP56=RNxUfKYJ10WAEp-m2bKhC-WVVF8Nt6PZ+JuJw@mail.gmail.com>
Date:   Thu, 19 Mar 2020 17:31:44 +0900
From:   Joonsoo Kim <js1304@...il.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Michal Hocko <mhocko@...nel.org>,
        Hugh Dickins <hughd@...gle.com>,
        Minchan Kim <minchan@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mel Gorman <mgorman@...hsingularity.net>, kernel-team@....com,
        Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [PATCH v3 6/9] mm/workingset: handle the page without memcg

2020년 3월 19일 (목) 오전 4:59, Johannes Weiner <hannes@...xchg.org>님이 작성:
>
> On Tue, Mar 17, 2020 at 02:41:54PM +0900, js1304@...il.com wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@....com>
> >
> > When implementing workingset detection for anonymous page, I found
> > some swapcache pages with NULL memcg. From the code reading, I found
> > two reasons.
> >
> > One is the case that swap-in readahead happens. The other is the
> > corner case related to the shmem cache. These two problems should be
> > fixed, but, it's not straight-forward to fix. For example, when swap-off,
> > all swapped-out pages are read into swapcache. In this case, who's the
> > owner of the swapcache page?
> >
> > Since this problem doesn't look trivial, I decide to leave the issue and
> > handles this corner case on the place where the error occurs.
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
>
> It wouldn't be hard to find out who owns this page. The code in
> mem_cgroup_try_charge() is only a few lines:

lookup_swap_cgroup_id() uses additional memory and is only
usable if CONFIG_MEMCG_SWAP is on.

>                         swp_entry_t ent = { .val = page_private(page), };
>                         unsigned short id = lookup_swap_cgroup_id(ent);
>
>                         rcu_read_lock();
>                         memcg = mem_cgroup_from_id(id);
>                         if (memcg && !css_tryget_online(&memcg->css))
>                                 memcg = NULL;
>                         rcu_read_unlock();
>
> THAT BEING SAID, I don't think we actually *want* to know the original
> cgroup for readahead pages. Because before they are accessed and
> charged to the original owner, the pages are sitting on the root
> cgroup LRU list and follow the root group's aging speed and LRU order.

Okay. Sound reasonable.

> Eviction and refault tracking is about the LRU that hosts the pages.
>
> So IMO your patch is much less of a hack than you might think.

Good!

> > diff --git a/mm/workingset.c b/mm/workingset.c
> > index a9f474a..8d2e83a 100644
> > --- a/mm/workingset.c
> > +++ b/mm/workingset.c
> > @@ -257,6 +257,10 @@ void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg)
> >       VM_BUG_ON_PAGE(page_count(page), page);
> >       VM_BUG_ON_PAGE(!PageLocked(page), page);
> >
> > +     /* page_memcg() can be NULL if swap-in readahead happens */
> > +     if (!page_memcg(page))
> > +             return NULL;
> > +
> >       advance_inactive_age(page_memcg(page), pgdat, is_file);
> >
> >       lruvec = mem_cgroup_lruvec(target_memcg, pgdat);
>
> This means a readahead page that hasn't been accessed will actively
> not be tracked as an eviction and later as a refault.
>
> I think that's the right thing to do, but I would expand the comment:

Okay. I will add the following comment.

Thanks.

> /*
>  * A page can be without a cgroup here when it was brought in by swap
>  * readahead and nobody has touched it since.
>  *
>  * The idea behind the workingset code is to tell on page fault time
>  * whether pages have been previously used or not. Since this page
>  * hasn't been used, don't store a shadow entry for it; when it later
>  * faults back in, we treat it as the new page that it is.
>  */