linux-kernel - Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200827133523.GC3090@techsingularity.net>
Date:   Thu, 27 Aug 2020 14:35:23 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Joonsoo Kim <js1304@...il.com>
Cc:     Vlastimil Babka <vbabka@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Michal Hocko <mhocko@...nel.org>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
        kernel-team@....com, Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for
 memalloc_nocma_{save/restore} APIs

On Wed, Aug 26, 2020 at 02:12:44PM +0900, Joonsoo Kim wrote:
> > > And, it requires to break current code
> > > layering that order-0 page is always handled by the pcplist. I'd prefer
> > > to avoid it so this patch uses different way to skip CMA page allocation
> > > from the pcplist.
> >
> > Well it would be much simpler and won't affect most of allocations. Better than
> > flushing pcplists IMHO.
> 
> Hmm...Still, I'd prefer my approach.

I prefer the pcp bypass approach. It's simpler and it does not incur a
pcp drain/refill penalty. 

> There are two reasons. First,
> layering problem
> mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
> As the name shows, it's for high order atomic allocation. But, after
> skipping pcplist
> allocation as you suggested, we could get there with order 0 request.

I guess your concern is that under some circumstances that a request that
passes a watermark check could fail due to a highatomic reserve and to
an extent this is true. However, in that case the system is already low
on memory depending on the allocation context, the pcp lists may get
flushed anyway.

> We can also
> change this code, but, I'd hope to maintain current layering. Second,
> a performance
> reason. After the flag for nocma is up, a burst of nocma allocation
> could come. After
> flushing the pcplist one times, we can use the free page on the
> pcplist as usual until
> the context is changed.

It's not guaranteed because CMA pages could be freed between the nocma save
and restore triggering further drains due to a reschedule.  Similarly,
a CMA allocation in parallel could refill with CMA pages on the per-cpu
list. While both cases are unlikely, it's more unpredictable than a
straight-forward pcp bypass.

I don't really see it as a layering violation of the API because all
order-0 pages go through the PCP lists. The fact that order-0 is serviced
from the pcp list is an internal implementation detail, the API doesn't
care.

-- 
Mel Gorman
SUSE Labs