[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4838064.GXAFRqVoOG@suse>
Date: Sun, 23 Jul 2023 15:14:18 +0200
From: "Fabio M. De Francesco" <fmdefrancesco@...il.com>
To: Jonathan Corbet <corbet@....net>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Linus Walleij <linus.walleij@...aro.org>,
Mike Rapoport <rppt@...nel.org>, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Matthew Wilcox <willy@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Bagas Sanjaya <bagasdotme@...il.com>,
Randy Dunlap <rdunlap@...radead.org>
Subject: Re: [RFC PATCH] Documentation/page_tables: MMU, TLB, and Page Faults
On sabato 22 luglio 2023 02:43:13 CEST Fabio M. De Francesco wrote:
> Extend page_tables.rst by adding a small introductive section about
> the role of MMU and TLB in translating between virtual addresses and
> physical page frames. Furthermore explain the concepts behind the
> Page Faults exceptions and how Linux handles them.
This is superseded by "[RFC PATCH v2] Documentation/page_tables: Add info
about MMU/TLB and Page Faults" at https://lore.kernel.org/lkml/
20230723120721.7139-1-fmdefrancesco@...il.com/. Please refer to the above-
mentioned second version and discard this.
Thanks,
Fabio
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Bagas Sanjaya <bagasdotme@...il.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> Cc: Jonathan Corbet <corbet@....net>
> Cc: Linus Walleij <linus.walleij@...aro.org>
> Cc: Matthew Wilcox <willy@...radead.org>
> Cc: Mike Rapoport <rppt@...nel.org>
> Cc: Randy Dunlap <rdunlap@...radead.org>
> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@...il.com>
> ---
>
> This is an RFC PATCH because of two reasons:
>
> 1) I've heard that there is consensus about the need to revise and
> extend the MM documentation, but I'm not sure about whether or not
> developers need these kind of introductory information.
>
> 2) While preparing this little patch I decided to take a quicj look at
> the code and found out it currently is not how I thought I remembered
> it. I'm especially speaking about the x86 case. I'm not sure that I've
> been able to properly understand what I described as a difference in
> workflow compared to most of the other architecture.
>
> Therefore, for the two reasons explained above, I'd like to hear from
> people actively involved in MM. If this is not what you want, feel free
> to throw it away. Otherwise I'd be happy to write more on this and other
> MM topics. I'm looking forward for comments on this small work.
>
> Documentation/mm/page_tables.rst | 61 ++++++++++++++++++++++++++++++++
> 1 file changed, 61 insertions(+)
>
> diff --git a/Documentation/mm/page_tables.rst
> b/Documentation/mm/page_tables.rst index 7840c1891751..fa617894fda8 100644
> --- a/Documentation/mm/page_tables.rst
> +++ b/Documentation/mm/page_tables.rst
> @@ -152,3 +152,64 @@ Page table handling code that wishes to be
> architecture-neutral, such as the virtual memory manager, will need to be
> written so that it traverses all of the currently five levels. This style
> should also be preferred for
> architecture-specific code, so as to be robust to future changes.
> +
> +
> +MMU, TLB, and Page Faults
> +=========================
> +
> +The Memory Management Unit (MMU) is a hardware component that handles
virtual
> to +physical address translations. It uses a relatively small cache in
> hardware +called the Translation Lookaside Buffer (TLB) to speed up these
> translations. +When a process wants to access a memory location, the CPU
> provides a virtual +address to the MMU, which then uses the TLB to quickly
> find the corresponding +physical address.
> +
> +However, sometimes the MMU can't find a valid translation in the TLB. This
> +could be because the process is trying to access a range of memory that
it's
> not +allowed to, or because the memory hasn't been loaded into RAM yet. When
> this +happens, the MMU triggers a page fault, which is a type of interrupt
> that +signals the CPU to pause the current process and run a special
function
> to +handle the fault.
> +
> +One cause of page faults is due to bugs (or maliciously crafted addresses)
> and +happens when a process tries to access a range of memory that it
doesn't
> have +permission to. This could be because the memory is reserved for the
> kernel or +for another process, or because the process is trying to write to
> a read-only +section of memory. When this happens, the kernel sends a
> Segmentation Fault +(SIGSEGV) signal to the process, which usually causes
the
> process to terminate. +
> +An expected and more common cause of page faults is "lazy allocation". This
> is +a technique used by the Kernel to improve memory efficiency and reduce
> +footprint. Instead of allocating physical memory to a process as soon as
> it's +requested, the kernel waits until the process actually tries to use
the
> memory. +This can save a significant amount of memory in cases where a
> process requests +a large block but only uses a small portion of it.
> +
> +A related technique is "Copy-on-Write" (COW), where the Kernel allows
> multiple +processes to share the same physical memory as long as they're
only
> reading +from it. If a process tries to write to the shared memory, the
> kernel triggers +a page fault and allocates a separate copy of the memory
for
> the process. This +allows the kernel to save memory and avoid unnecessary
> data copying and, by +doing so, it reduces latency.
> +
> +Now, let's see how the Linux kernel handles these page faults:
> +
> +1. For most architectures, `do_page_fault()` is the primary interrupt
handler
> + for page faults. It delegates the actual handling of the page fault to +
> `handle_mm_fault()`. This function checks the cause of the page fault and +
> takes the appropriate action, such as loading the required page into +
> memory, granting the process the necessary permissions, or sending a +
> SIGSEGV signal to the process.
> +
> +2. In the specific case of the x86 architecture, the interrupt handler is
> + defined by the `DEFINE_IDTENTRY_RAW_ERRORCODE()` macro, which calls
> + `handle_page_fault()`. This function then calls either
> + `do_user_addr_fault()` or `do_kern_addr_fault()`, depending on whether
> + the fault occurred in user space or kernel space. Both of these
functions
> + eventually lead to `handle_mm_fault()`, similar to the workflow in other
> + architectures.
> +
> +The actual implementation of the workflow is very complex. Its design
allows
> +Linux to handle page faults in a way that is tailored to the specific
> +characteristics of each architecture, while still sharing a common overall
> +structure.
> --
> 2.41.0
Powered by blists - more mailing lists