linux-kernel - Re: [PATCH 1/2 v4] mm: vmscan: do not pass reclaimed slab to vmpressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 7 Feb 2017 13:17:45 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     vinayak menon <vinayakm.list@...il.com>
Cc:     Vinayak Menon <vinmenon@...eaurora.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        mgorman@...hsingularity.net, vbabka@...e.cz,
        Rik van Riel <riel@...hat.com>, vdavydov.dev@...il.com,
        anton.vorontsov@...aro.org, Minchan Kim <minchan@...nel.org>,
        shashim@...eaurora.org, "linux-mm@...ck.org" <linux-mm@...ck.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2 v4] mm: vmscan: do not pass reclaimed slab to
 vmpressure

On Tue 07-02-17 16:39:15, vinayak menon wrote:
> On Tue, Feb 7, 2017 at 1:40 PM, Michal Hocko <mhocko@...nel.org> wrote:
> > On Mon 06-02-17 20:40:10, vinayak menon wrote:
> >> On Mon, Feb 6, 2017 at 6:22 PM, Michal Hocko <mhocko@...nel.org> wrote:
[...]
> >> > It would be also more than useful to say how much the slab reclaim
> >> > really contributed.
> >>
> >> The 70% less events is caused by slab reclaim being added to
> >> vmpressure, which is confirmed by running the test with and without
> >> the fix.  But it is hard to say the effect on reclaim stats is caused
> >> by this problem because, the lowmemorykiller can be written with
> >> different heuristics to make the reclaim look better.
> >
> > Exactly! And this is why I am not still happy with the current
> > justification of this patch. It seems to be tuning for a particular
> > consumer of vmpressure events. Others might depend on a less pessimistic
> > events because we are making some progress afterall. Being more
> > pessimistic can lead to premature oom or other performance related
> > decisions and that is why I am not happy about that.
> >
> > Btw. could you be more specific about your particular test? What is
> > desired/acceptable result?
>
> The test opens multiple applications on android in a sequence and
> then repeats this for N times. Time taken to launch the application
> is measured. With and without the patch the deviation is seen in the
> launch latencies. The launch latency diff is caused by the lesser
> number of kills (because of vmpressure difference).

So this is basically lmk throughput test. Is this representative enough
to make any decisions?

> >> The issue we see
> >> in the above reclaim stats is entirely because of task kills being
> >> delayed. That is the reason why I did not include the vmstat stats in
> >> the changelog in the earlier versions.
> >>
> >> >
> >> >> This is a regression introduced by commit 6b4f7799c6a5 ("mm: vmscan:
> >> >> invoke slab shrinkers from shrink_zone()").
> >> >
> >> > I am not really sure this is a regression, though. Maybe your heuristic
> >> > which consumes events is just too fragile?
> >> >
> >> Yes it could be. A different kind of lowmemorykiller may not show up
> >> this issue at all. In my opinion the regression here is the difference
> >> in vmpressure values and thus the vmpressure events because of passing
> >> slab reclaimed pages to vmpressure without considering the scanned
> >> pages and cost model.
> >> So would it be better to drop the vmstat data from changelog ?
> >
> > No! The main question is whether being more pessimistic and report
> > higher reclaim levels really does make sense even when there is a slab
> > reclaim progress. This hasn't been explained and I _really_ do not like
> > a patch which optimizes for a particular consumer of events.
> >
> > I understand that the change of the behavior is unexpeted and that
> > might be reason to revert to the original one. But if this is the only
> > reasonable way to go I would, at least, like to understand what is going
> > on here. Why cannot your lowmemorykiller cope with the workload? Why
> > starting to kill sooner (at the time when the slab still reclaims enough
> > pages to report lower critical events) helps to pass your test. Maybe it
> > is the implementation of the lmk which needs to be changed because it
> > has some false expectations? Or the memory reclaim just behaves in an
> > unpredictable manner?
>
> Say if 4.4 had actually implemented page based shrinking model for
> slab and included the correct scanned and reclaimed to vmpressure
> considering the cost model, then it is all fine and behavior
> difference if any shown by a vmpressure client need to be fixed. But
> as I understand, the case here is different.

> vmpressure was implemented to work with scanned and reclaimed pages
> from LRU and it works
> well for at least some use cases.

Userspace shouldn't care about the specific implementation at all. We
should be able to change the implementation without anybody noticing
actually.

> As you had pointed out earlier there could be problems with the way
> vmpressure works since it is not considering many other costs. But
> it shows an estimate of the pressure on LRUs. I think adding just
> the slab reclaimed to nr_reclaimed without considering the cost is
> arbitrary and it disturbs the LRU pressure which vmpressure reports
> properly.

Well it is not completely arbitrary. Slabs are scanned proportionally to
the LRU scanning.

> So shouldn't we account slab reclaimed in vmpressure only when we
> have a proper way to do it ? By adding slab reclaimed pages, we are
> saying vmpressure that X pages were reclaimed with 0 effort. With
> this patch the vmpressure will show an estimate of pressure on LRU
> and restores the original behavior of vmpressure. If we add in
> future the slab cost, vmpressure can become more accurate. But just
> adding slab reclaimed is arbitrary right ? Consider a case where we
> start to account reclaimed pages from other shrinkers which are not
> reporting their reclaimed values right now.  Like zsmalloc, android
> lowmemorykiller etc. Then nr_reclaimed sent to vmpressure will just
> be bloated and will make vmpressure useless right ? And most of the
> time vmpressure will receive reclaimed greater than scanned and won't
> be reporting any critical events. The problem we are encountering now
> with slab reclaimed is a subset of the case above right ?

The main point here is whether we really _should_ emit critical events
when we actually _reclaim_ pages. This is something I haven't heard an
answer for.

> Starting to kill at the right time helps in recovering memory at a
> faster rate than waiting for the reclaim to complete. Yes, we may
> be able to modify lowmemorykiller to cope with this problem. But
> the actual problem this patch tried to fix was the vmpressure event
> regression.

I am not happy about the regression but you should try to understand
that we might end up with another report a month later for a different
consumer of events.

I believe that the vmpressure needs some serious rethought and come with
a more realistic and stable metric.
-- 
Michal Hocko
SUSE Labs