[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y7wXm/3POivfwSHE@kernel.org>
Date: Mon, 9 Jan 2023 15:33:15 +0200
From: Mike Rapoport <rppt@...nel.org>
To: Lorenzo Stoakes <lstoakes@...il.com>
Cc: Jonathan Corbet <corbet@....net>,
Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 2/2] docs/mm: Physical Memory: add structure,
introduction and nodes description
On Fri, Jan 06, 2023 at 10:32:46PM +0000, Lorenzo Stoakes wrote:
> On Sun, Jan 01, 2023 at 11:45:23AM +0200, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" <rppt@...nel.org>
> >
> > Signed-off-by: Mike Rapoport (IBM) <rppt@...nel.org>
> > ---
> > Documentation/mm/physical_memory.rst | 322 +++++++++++++++++++++++++++
> > 1 file changed, 322 insertions(+)
> >
> > diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
> > index 2ab7b8c1c863..fcf52f1db16b 100644
> > --- a/Documentation/mm/physical_memory.rst
> > +++ b/Documentation/mm/physical_memory.rst
> > @@ -3,3 +3,325 @@
> > ===============
> > Physical Memory
> > ===============
> > +
> > +Linux is available for a wide range of architectures so there is a need for an
> > +architecture-independent abstraction to represent the physical memory. This
> > +chapter describes the structures used to manage physical memory in a running
> > +system.
> > +
> > +The first principal concept prevalent in the memory management is
> > +`Non-Uniform Memory Access (NUMA)
> > +<https://en.wikipedia.org/wiki/Non-uniform_memory_access>`_.
> > +With multi-core and multi-socket machines, memory may be arranged into banks
> > +that incur a different cost to access depending on the “distance” from the
> > +processor. For example, there might be a bank of memory assigned to each CPU or
> > +a bank of memory very suitable for DMA near peripheral devices.
>
> Absolutely wonderfully written.
Thanks to Mel :)
> Perhaps put a sub-heading for NUMA here?
I consider all this text as an high level overview and I'd prefer to keep
it as a single piece.
> An aside, but I think it'd be a good idea to mention base pages, folios and
> folio order pretty early on as they get touched as concepts all over the place
> in physical memory (but perhaps can wait for other contribs!)
The plan is to have "Pages" section Really Soon :)
> > +
> > +Each bank is called a node and the concept is represented under Linux by a
> > +``struct pglist_data`` even if the architecture is UMA. This structure is
> > +always referenced to by it's typedef ``pg_data_t``. A pg_data_t structure
> > +for a particular node can be referenced by ``NODE_DATA(nid)`` macro where
> > +``nid`` is the ID of that node.
> > +
> > +For NUMA architectures, the node structures are allocated by the architecture
> > +specific code early during boot. Usually, these structures are allocated
> > +locally on the memory bank they represent. For UMA architectures, only one
> > +static pg_data_t structure called ``contig_page_data`` is used. Nodes will
> > +be discussed further in Section :ref:`Nodes <nodes>`
> > +
> > +Each node may be divided up into a number of blocks called zones which
> > +represent ranges within memory. These ranges are usually determined by
> > +architectural constraints for accessing the physical memory. A zone is
> > +described by a ``struct zone_struct``, typedeffed to ``zone_t`` and each zone
> > +has one of the types described below.
>
> I don't think it's quite right to say 'may' be divided up into zones, as they
> absolutely will be so (and the entire phsyical memory allocator hinges on being
> zoned, even if trivially in UMA/single zone cases).
Not necessarily. ZONE_DMA or ZONE_NORMAL may span the entire memory.
> Also it's struct zone right, not zone_struct/zone_t?
Right, thanks.
> I think it's important to clarify that a given zone does not map to a single
> struct zone, rather that a struct zone (contained within a pg_data_t object's
> array node_zones[]) represents only the portion of the zone that resides in this
> node.
>
> It's fiddly because when I talk about a zone like this I am referring to one of
> the 'classifications' of zones you mention below, e.g. ZONE_DMA, ZONE_DMA32,
> etc. but you might also want to refer to a zone as being equivalent to a struct
> zone object.
>
> I think the clearest thing however is to use the term zone to refer to each of
> the ZONE_xxx types, e.g. 'this memory is located in ZONE_NORMAL' and to clarify
> that one zone can span different individual struct zones (and thus nodes).
>
> I know it's tricky because you and others have rightly pointed out that my own
> explanation of this is confusing, and it is something I intend to rejig a bit
> myself!
The term 'zone' is indeed somewhat ambiguous, I'll try to come up with more
clear version.
> > +
> > +`ZONE_DMA` and `ZONE_DMA32`
> > + represent memory suitable for DMA by peripheral devices that cannot
> > + access all of the addressable memory. Depending on the architecture,
> > + either of these zone types or even they both can be disabled at build
> > + time using ``CONFIG_ZONE_DMA`` and ``CONFIG_ZONE_DMA32`` configuration
> > + options. Some 64-bit platforms may need both zones as they support
> > + peripherals with different DMA addressing limitations.
>
> It might be worth pointing out ZONE_DMA spans an incredibly little range that
> probably won't matter for any peripherals this side of the cretaceous period,
On RPi4 ZONE_DMA spans 1G, which is quite some part of the memory ;-)
> > +
> > +`ZONE_NORMAL`
> > + is for normal memory that can be accessed by the kernel all the time. DMA
> > + operations can be performed on pages in this zone if the DMA devices support
> > + transfers to all addressable memory. ZONE_NORMAL is always enabled.
> > +
>
> Might be worth saying 'this is where memory ends up if not otherwise in another
> zone'.
This may not be the case on !x86.
> > +`ZONE_HIGHMEM`
> > + is the part of the physical memory that is not covered by a permanent mapping
> > + in the kernel page tables. The memory in this zone is only accessible to the
> > + kernel using temporary mappings. This zone is available only some 32-bit
> > + architectures and is enabled with ``CONFIG_HIGHMEM``.
> > +
>
> I comment here only to say 'wow I am so glad I chose to only focus on 64-bit so
> I could side-step all the awkward discussion of high pages' :)
>
> > +The relation between node and zone extents is determined by the physical memory
> > +map reported by the firmware, architectural constraints for memory addressing
> > +and certain parameters in the kernel command line.
>
> Perhaps worth mentioning device tree here? Though perhaps encapsulated in the
> 'firmware' reference.
It is :)
> > +Node structure
> > +--------------
> > +
> > +The struct pglist_data is declared in `include/linux/mmzone.h
> > +<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mmzone.h>`_.
> > +Here we briefly describe fields of this structure:
>
> Perhaps worth saying 'The node structure' just to reiterate.
Not sure I follow, can you phrase the entire sentence?
> > +
> > +General
> > +~~~~~~~
> > +
> > +`node_zones`
> > + The zones for this node. Not all of the zones may be populated, but it is
> > + the full list. It is referenced by this node's node_zonelists as well as
> > + other node's node_zonelists.
>
> Perhaps worth describing what zonelists (and equally zonerefs) are here or
> above, and that this is the canonical place where zones reside. Maybe reference
> populated_zone() and for_each_populated_zone() in reference to the fact that not
> all here may be populated?
I'd prefer to start simple and than add more content on top.
> > +
> > +`node_zonelists` The list of all zones in all nodes. This list defines the
> > + order of zones that allocations are preferred from. The `node_zonelists` is
> > + set up by build_zonelists() in mm/page_alloc.c during the initialization of
> > + core memory management structures.
> > +
> > +`nr_zones`
> > + Number of populated zones in this node.
> > +
> > +`node_mem_map`
> > + For UMA systems that use FLATMEM memory model the 0's node (and the only)
> > + `node_mem_map` is array of struct pages representing each physical frame.
> > +
> > +`node_page_ext`
> > + For UMA systems that use FLATMEM memory model the 0's (and the only) node
> > + `node_mem_map` is array of extensions of struct pages. Available only in the
> > + kernels built with ``CONFIG_PAGE_EXTENTION`` enabled.
> > +
> > +`node_start_pfn`
> > + The page frame number of the starting page frame in this node.
> > +
> > +`node_present_pages`
> > + Total number of physical pages present in this node.
> > +
> > +`node_spanned_pages`
> > + Total size of physical page range, including holes.
> > +
>
> I think it'd be useful to discuss briefly the meaning of managed, spanned and
> present pages in the context of zones.
This will be a part of the Zones section.
> Cheers, Lorenzo
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists