lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzYuA3ZRCwPsAxhQZDOOpjSTrphKEsgPAqgRP8Ly7+fTWw@mail.gmail.com>
Date: Tue, 22 Apr 2025 15:48:52 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: akpm@...ux-foundation.org, Liam.Howlett@...cle.com, 
	lorenzo.stoakes@...cle.com, david@...hat.com, vbabka@...e.cz, 
	peterx@...hat.com, jannh@...gle.com, hannes@...xchg.org, mhocko@...nel.org, 
	paulmck@...nel.org, shuah@...nel.org, adobriyan@...il.com, brauner@...nel.org, 
	josef@...icpanda.com, yebin10@...wei.com, linux@...ssschuh.net, 
	willy@...radead.org, osalvador@...e.de, andrii@...nel.org, 
	ryan.roberts@....com, christophe.leroy@...roup.eu, 
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-mm@...ck.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU

On Fri, Apr 18, 2025 at 10:50 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> With maple_tree supporting vma tree traversal under RCU and vma and
> its important members being RCU-safe, /proc/pid/maps can be read under
> RCU and without the need to read-lock mmap_lock. However vma content
> can change from under us, therefore we make a copy of the vma and we
> pin pointer fields used when generating the output (currently only
> vm_file and anon_name). Afterwards we check for concurrent address
> space modifications, wait for them to end and retry. While we take
> the mmap_lock for reading during such contention, we do that momentarily
> only to record new mm_wr_seq counter. This change is designed to reduce

This is probably a stupid question, but why do we need to take a lock
just to record this counter? uprobes get away without taking mmap_lock
even for reads, and still record this seq counter. And then detect
whether there were any modifications in between. Why does this change
need more heavy-weight mmap_read_lock to do speculative reads?

> mmap_lock contention and prevent a process reading /proc/pid/maps files
> (often a low priority task, such as monitoring/data collection services)
> from blocking address space updates.
> Note that this change has a userspace visible disadvantage: it allows
> for sub-page data tearing as opposed to the previous mechanism where
> data tearing could happen only between pages of generated output data.
> Since current userspace considers data tearing between pages to be
> acceptable, we assume is will be able to handle sub-page data tearing
> as well.
>
> Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
> ---
>  fs/proc/internal.h        |   6 ++
>  fs/proc/task_mmu.c        | 170 ++++++++++++++++++++++++++++++++++----
>  include/linux/mm_inline.h |  18 ++++
>  3 files changed, 177 insertions(+), 17 deletions(-)
>
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 96122e91c645..6e1169c1f4df 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -379,6 +379,12 @@ struct proc_maps_private {
>         struct task_struct *task;
>         struct mm_struct *mm;
>         struct vma_iterator iter;
> +       bool mmap_locked;
> +       loff_t last_pos;
> +#ifdef CONFIG_PER_VMA_LOCK
> +       unsigned int mm_wr_seq;
> +       struct vm_area_struct vma_copy;
> +#endif
>  #ifdef CONFIG_NUMA
>         struct mempolicy *task_mempolicy;
>  #endif
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b9e4fbbdf6e6..f9d50a61167c 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -127,13 +127,130 @@ static void release_task_mempolicy(struct proc_maps_private *priv)
>  }
>  #endif
>
> -static struct vm_area_struct *proc_get_vma(struct proc_maps_private *priv,
> -                                               loff_t *ppos)
> +#ifdef CONFIG_PER_VMA_LOCK
> +
> +static const struct seq_operations proc_pid_maps_op;
> +
> +/*
> + * Take VMA snapshot and pin vm_file and anon_name as they are used by
> + * show_map_vma.
> + */
> +static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
> +{
> +       struct vm_area_struct *copy = &priv->vma_copy;
> +       int ret = -EAGAIN;
> +
> +       memcpy(copy, vma, sizeof(*vma));
> +       if (copy->vm_file && !get_file_rcu(&copy->vm_file))
> +               goto out;
> +
> +       if (!anon_vma_name_get_if_valid(copy))
> +               goto put_file;

Given vm_file and anon_vma_name are both RCU-protected, if we take
rcu_read_lock() at m_start/m_next before fetching VMA, we shouldn't
even need getting/putting them, no?

I feel like I'm missing some important limitations, but I don't think
they are spelled out explicitly...

> +
> +       if (!mmap_lock_speculate_retry(priv->mm, priv->mm_wr_seq))
> +               return 0;
> +
> +       /* Address space got modified, vma might be stale. Re-lock and retry. */
> +       rcu_read_unlock();

Another question I have is whether we really need to copy vma into
priv->vma_copy to have a stable snapshot? Can't we just speculate like
with uprobes under assumption that data doesn't change. And once we
are done producing text output, confirm that speculation was
successful, and if not - retry?

We'd need an API for seq_file to rollback to previous good known
location for that, but that seems straightforward enough to do, no?
Just remember buffer position before speculation, write data, check
for no mm modifications, and if something changed, rollback seq file
to last position.


> +       ret = mmap_read_lock_killable(priv->mm);
> +       if (!ret) {
> +               /* mmap_lock_speculate_try_begin() succeeds when holding mmap_read_lock */
> +               mmap_lock_speculate_try_begin(priv->mm, &priv->mm_wr_seq);
> +               mmap_read_unlock(priv->mm);
> +               ret = -EAGAIN;
> +       }
> +
> +       rcu_read_lock();
> +
> +       anon_vma_name_put_if_valid(copy);
> +put_file:
> +       if (copy->vm_file)
> +               fput(copy->vm_file);
> +out:
> +       return ret;
> +}
> +
> +static void put_vma_snapshot(struct proc_maps_private *priv)
> +{
> +       struct vm_area_struct *vma = &priv->vma_copy;
> +
> +       anon_vma_name_put_if_valid(vma);
> +       if (vma->vm_file)
> +               fput(vma->vm_file);
> +}
> +

[...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ