[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpEy1QV-9uTUtrFvN-6eS5KFw-ZyQothQbLqXFyUaJ4xgQ@mail.gmail.com>
Date: Tue, 6 Aug 2024 09:13:45 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: fix (harmless) type confusion in lock_vma_under_rcu()
On Mon, Aug 5, 2024 at 5:52 AM Jann Horn <jannh@...gle.com> wrote:
>
> There is a (harmless) type confusion in lock_vma_under_rcu():
> After vma_start_read(), we have taken the VMA lock but don't know yet
> whether the VMA has already been detached and scheduled for RCU freeing.
> At this point, ->vm_start and ->vm_end are accessed.
>
> vm_area_struct contains a union such that ->vm_rcu uses the same memory as
> ->vm_start and ->vm_end; so accessing ->vm_start and ->vm_end of a detached
> VMA is illegal and leads to type confusion between union members.
>
> Fix it by reordering the vma->detached check above the address checks, and
> document the rules for RCU readers accessing VMAs.
>
> This will probably change the number of observed VMA_LOCK_MISS events
> (since previously, trying to access a detached VMA whose ->vm_rcu has been
> scheduled would bail out when checking the fault address against the
> rcu_head members reinterpreted as VMA bounds).
>
> Fixes: 50ee32537206 ("mm: introduce lock_vma_under_rcu to be used from arch-specific code")
> Signed-off-by: Jann Horn <jannh@...gle.com>
Thanks for fixing this subtle issue and clearly documenting the rules!
Not sure if we should CC stable? It is harmless but it's still a bug...
Acked-by: Suren Baghdasaryan <surenb@...gle.com>
> ---
> sidenote: I'm not entirely sure why we handle detached VMAs and moved
> VMAs differently here (detached VMAs retry, moved VMAs bail out), but
> that's kinda out of scope of this patch.
Yeah, technically in both cases the address space is being modified
and we should bail out and retry with mmap_lock. I just think if the
VMA got replaced while we are calling lock_vma_under_rcu(), it's
reasonable to retry the search and try finding the new VMA if it's
already established and unlocked.
> ---
> include/linux/mm_types.h | 15 +++++++++++++--
> mm/memory.c | 14 ++++++++++----
> 2 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 485424979254..498cdf3121d0 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -657,58 +657,69 @@ struct vma_numab_state {
>
> /*
> * This struct describes a virtual memory area. There is one of these
> * per VM-area/task. A VM area is any part of the process virtual memory
> * space that has a special rule for the page-fault handlers (ie a shared
> * library, the executable area etc).
> + *
> + * Only explicitly marked struct members may be accessed by RCU readers before
> + * getting a stable reference.
> */
> struct vm_area_struct {
> /* The first cache line has the info for VMA tree walking. */
>
> union {
> struct {
> /* VMA covers [vm_start; vm_end) addresses within mm */
> unsigned long vm_start;
> unsigned long vm_end;
> };
> #ifdef CONFIG_PER_VMA_LOCK
> struct rcu_head vm_rcu; /* Used for deferred freeing. */
> #endif
> };
>
> - struct mm_struct *vm_mm; /* The address space we belong to. */
> + /*
> + * The address space we belong to.
> + * Unstable RCU readers are allowed to read this.
> + */
> + struct mm_struct *vm_mm;
> pgprot_t vm_page_prot; /* Access permissions of this VMA. */
>
> /*
> * Flags, see mm.h.
> * To modify use vm_flags_{init|reset|set|clear|mod} functions.
> */
> union {
> const vm_flags_t vm_flags;
> vm_flags_t __private __vm_flags;
> };
>
> #ifdef CONFIG_PER_VMA_LOCK
> - /* Flag to indicate areas detached from the mm->mm_mt tree */
> + /*
> + * Flag to indicate areas detached from the mm->mm_mt tree.
> + * Unstable RCU readers are allowed to read this.
> + */
> bool detached;
>
> /*
> * Can only be written (using WRITE_ONCE()) while holding both:
> * - mmap_lock (in write mode)
> * - vm_lock->lock (in write mode)
> * Can be read reliably while holding one of:
> * - mmap_lock (in read or write mode)
> * - vm_lock->lock (in read or write mode)
> * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
> * while holding nothing (except RCU to keep the VMA struct allocated).
> *
> * This sequence counter is explicitly allowed to overflow; sequence
> * counter reuse can only lead to occasional unnecessary use of the
> * slowpath.
> */
> int vm_lock_seq;
> + /* Unstable RCU readers are allowed to read this. */
> struct vma_lock *vm_lock;
> #endif
>
> /*
> * For areas with an address space and backing store,
> * linkage into the address_space->i_mmap interval tree.
> diff --git a/mm/memory.c b/mm/memory.c
> index 34f8402d2046..3f4232b985a1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5996,23 +5996,29 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> if (!vma)
> goto inval;
>
> if (!vma_start_read(vma))
> goto inval;
>
> - /* Check since vm_start/vm_end might change before we lock the VMA */
> - if (unlikely(address < vma->vm_start || address >= vma->vm_end))
> - goto inval_end_read;
> -
> /* Check if the VMA got isolated after we found it */
> if (vma->detached) {
> vma_end_read(vma);
> count_vm_vma_lock_event(VMA_LOCK_MISS);
> /* The area was replaced with another one */
> goto retry;
> }
> + /*
> + * At this point, we have a stable reference to a VMA: The VMA is
> + * locked and we know it hasn't already been isolated.
> + * From here on, we can access the VMA without worrying about which
> + * fields are accessible for RCU readers.
> + */
> +
> + /* Check since vm_start/vm_end might change before we lock the VMA */
> + if (unlikely(address < vma->vm_start || address >= vma->vm_end))
> + goto inval_end_read;
>
> rcu_read_unlock();
> return vma;
>
> inval_end_read:
> vma_end_read(vma);
>
> ---
> base-commit: de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed
> change-id: 20240805-fix-vma-lock-type-confusion-0a956d9d31ae
> --
> Jann Horn <jannh@...gle.com>
>
Powered by blists - more mailing lists