linux-kernel - Re: [PATCH v2 2/9] mm/vmstat: show start

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0f038010-ed83-55bb-70a5-24f5c6d68666@gmail.com>
Date:   Wed, 12 Oct 2022 16:57:53 -0700
From:   Doug Berger <opendmb@...il.com>
To:     David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Jonathan Corbet <corbet@....net>, Mike Rapoport <rppt@...nel.org>,
        Borislav Petkov <bp@...e.de>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        Muchun Song <songmuchun@...edance.com>,
        KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
        Mel Gorman <mgorman@...e.de>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH v2 2/9] mm/vmstat: show start_pfn when zone spans pages

On 10/5/2022 11:09 AM, David Hildenbrand wrote:
> On 01.10.22 03:28, Doug Berger wrote:
>> On 9/29/2022 1:15 AM, David Hildenbrand wrote:
>>> On 29.09.22 00:32, Doug Berger wrote:
>>>> A zone that overlaps with another zone may span a range of pages
>>>> that are not present. In this case, displaying the start_pfn of
>>>> the zone allows the zone page range to be identified.
>>>>
>>>
>>> I don't understand the intention here.
>>>
>>> "/* If unpopulated, no other information is useful */"
>>>
>>> Why would the start pfn be of any use here?
>>>
>>> What is the user visible impact without that change?
>> Yes, this is very subtle. I only caught it while testing some
>> pathological cases.
>>
>> If you take the example system:
>> The 7278 device has four ARMv8 CPU cores in an SMP cluster and two
>> memory controllers (MEMCs). Each MEMC is capable of controlling up to
>> 8GB of DRAM. An example 7278 system might have 1GB on each controller,
>> so an arm64 kernel might see 1GB on MEMC0 at 0x40000000-0x7FFFFFFF and
>> 1GB on MEMC1 at 0x300000000-0x33FFFFFFF.
>>
> 
> Okay, thanks. You should make it clearer in the patch description -- 
> especially how this relates to DMB. Having that said, I still have to 
> digest your examples:
> 
>> Placing a DMB on MEMC0 with 'movablecore=256M@...0000000' will lead to
>> the ZONE_MOVABLE zone spanning from 0x70000000-0x33fffffff and the
>> ZONE_NORMAL zone spanning from 0x300000000-0x33fffffff.
> 
> Why is ZONE_MOVABLE spanning more than 256M? It should span
> 
> 0x70000000-0x80000000
> 
> Or what am I missing?
I was working from the notion that the classic 'movablecore' 
implementation keeps the ZONE_MOVABLE zone the last zone on System RAM 
so it always spans the last page on the node (i.e. 0x33ffff000). My 
implementation moves the start of ZONE_MOVABLE up to the lowest page of 
any defined DMBs on the node.

I see that memory hotplug does not behave this way, which is probably 
more intuitive (though less consistent with the classic zone layout). I 
could attempt to change this in a v3 if desired.

> 
>>
>> If instead you specified 'movablecore=256M@...0000000,512M' you would
>> get the same ZONE_MOVABLE span, but the ZONE_NORMAL would now span
>> 0x300000000-0x32fffffff. The requested 512M of movablecore would be
>> divided into a 256MB DMB at 0x70000000 and a 256MB "classic" movable
>> zone start would be displayed in the bootlog as:
>> [    0.000000] Movable zone start for each node
>> [    0.000000]   Node 0: 0x000000330000000
> 
> 
> Okay, so that's the movable zone range excluding DMB.
> 
>>
>> Finally, if you specified the pathological
>> 'movablecore=256M@...0000000,1G@...' you would still have the same
>> ZONE_MOVABLE span, and the ZONE_NORMAL span would go back to
>> 0x300000000-0x33fffffff. However, because the second DMB (1G@12G)
>> completely overlaps the ZONE_NORMAL there would be no pages present in
>> ZONE_NORMAL and /proc/zoneinfo would report ZONE_NORMAL 'spanned
>> 262144', but not where those pages are. This commit adds the 'start_pfn'
>> back to the /proc/zoneinfo for ZONE_NORMAL so the span has context.
> 
> ... but why? If there are no pages present, there is no ZONE_NORMAL we 
> care about. The zone span should be 0. Does this maybe rather indicate 
> that there is a zone span processing issue in your DMB implementation?
My implementation uses the zones created by the classic 'movablecore' 
behavior and relocates the pages within DMBs. In this case the 
ZONE_NORMAL still has a span which gets output but no present pages so 
the output didn't show where the zone was without this patch. This is a 
convenience to avoid adding zone resizing and destruction logic outside 
of memory hotplug support, but I could attempt to add that code in a v3 
if desired.

> 
> Special-casing zones based on DMBs feels wrong. But most probably I am 
> missing something important :)
> 

Thanks for making me aware of your confusion so I can attempt to make it 
clearer.
-Doug