linux-kernel - Re: [PATCH] mm: page_alloc: unreserve highatomic page blocks before oom

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <37c58833-1953-42c3-98c6-ee0ac75508ce@quicinc.com>
Date:   Wed, 1 Nov 2023 12:16:21 +0530
From:   Pavan Kondeti <quic_pkondeti@...cinc.com>
To:     Charan Teja Kalla <quic_charante@...cinc.com>
CC:     Michal Hocko <mhocko@...e.com>, <akpm@...ux-foundation.org>,
        <mgorman@...hsingularity.net>, <david@...hat.com>,
        <vbabka@...e.cz>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: page_alloc: unreserve highatomic page blocks before
 oom

On Tue, Oct 31, 2023 at 06:43:55PM +0530, Charan Teja Kalla wrote:
> >> (early)OOM is encountered on a machine in the below state(excerpt from
> >> the oom kill logs):
> >> [  295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB
> >> high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB
> >> active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB
> >> present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB
> >> local_pcp:492kB free_cma:0kB
> >> [  295.998656] lowmem_reserve[]: 0 32
> >> [  295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH)
> >> 33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
> >> 0*4096kB = 7752kB
> > 
> > OK, this is quite interesting as well. The system is really tiny and 8MB
> > of reserved memory is indeed really high. How come those reservations
> > have grown that high?
> 
> Actually it is a VM running on the Linux kernel.
> 
> Regarding the reservations, I think it is because of the 'max_managed '
> calculations in the below:
> static void reserve_highatomic_pageblock(struct page *page, ....) {
>     ....
>   /*
>    * Limit the number reserved to 1 pageblock or roughly 1% of a zone.
>    * Check is race-prone but harmless.
>    */
>     max_managed = (zone_managed_pages(zone) / 100) + pageblock_nr_pages;
> 
>     if (zone->nr_reserved_highatomic >= max_managed)
>             goto out;
> 
>     zone->nr_reserved_highatomic += pageblock_nr_pages;
>     set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
>     move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
> out:
> }
> 
> Since we are always appending the 1% of zone managed pages count to
> pageblock_nr_pages, the minimum it is turning into 2 pageblocks as the
> 'nr_reserved_highatomic' is incremented/decremented in pageblock size
> granules.
> 
> And for my case the 8M out of ~50M is turned out to be 16%, which is high.
> 
> If the below looks fine to you, I can raise this as a separate change:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2a2536d..41441ced 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1886,7 +1886,9 @@ static void reserve_highatomic_pageblock(struct
> page *page, struct zone *zone)
>          * Limit the number reserved to 1 pageblock or roughly 1% of a zone.
>          * Check is race-prone but harmless.
>          */
> -       max_managed = (zone_managed_pages(zone) / 100) + pageblock_nr_pages;
> +       max_managed = max_t(unsigned long,
> +                       ALIGN(zone_managed_pages(zone) / 100,
> pageblock_nr_pages),
> +                       pageblock_nr_pages);
>         if (zone->nr_reserved_highatomic >= max_managed)
>                 return;
> 

ALIGN() rounds up the value, so max_t() is not needed here. If you had
used ALIGN_DOWN() then max_t() can be used to keep atleast
pageblock_nr_pages pages.


> >>
> >> Per above log, the free memory of ~7MB exist in the high atomic
> >> reserves is not freed up before falling back to oom kill path.
> >>
> >> This fix includes unreserving these atomic reserves in the OOM path
> >> before going for a kill. The side effect of unreserving in oom kill path
> >> is that these free pages are checked against the high wmark. If
> >> unreserved from should_reclaim_retry()/__alloc_pages_direct_reclaim(),
> >> they are checked against the min wmark levels.
> > 
> > I do not like the fix much TBH. I think the logic should live in
> 
> yeah, This code looks way too cleaner to me. Let me know If I can raise
> V2 with the below, suggested-by you.
> 

Also, add below Fixes tag if it makes sense.

Fixes: 04c8716f7b00 ("mm: try to exhaust highatomic reserve before the OOM")

Thanks,
Pavan