[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpHpujSbPcR2_jNTBu6+DTXvLBUoi2PjkYNJyTp62xaP9w@mail.gmail.com>
Date: Mon, 22 Jan 2024 22:07:01 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: SeongJae Park <sj@...nel.org>
Cc: akpm@...ux-foundation.org, viro@...iv.linux.org.uk, brauner@...nel.org,
jack@...e.cz, dchinner@...hat.com, casey@...aufler-ca.com,
ben.wolsieffer@...ring.com, paulmck@...nel.org, david@...hat.com,
avagin@...gle.com, usama.anjum@...labora.com, peterx@...hat.com,
hughd@...gle.com, ryan.roberts@....com, wangkefeng.wang@...wei.com,
Liam.Howlett@...cle.com, yuzhao@...gle.com, axelrasmussen@...gle.com,
lstoakes@...il.com, talumbau@...gle.com, willy@...radead.org, vbabka@...e.cz,
mgorman@...hsingularity.net, jhubbard@...dia.com, vishal.moola@...il.com,
mathieu.desnoyers@...icios.com, dhowells@...hat.com, jgg@...pe.ca,
sidhartha.kumar@...cle.com, andriy.shevchenko@...ux.intel.com,
yangxingui@...wei.com, keescook@...omium.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, kernel-team@...roid.com
Subject: Re: [PATCH 3/3] mm/maps: read proc/pid/maps under RCU
On Mon, Jan 22, 2024 at 9:36 PM SeongJae Park <sj@...nel.org> wrote:
>
> Hi Suren,
>
> On Sun, 21 Jan 2024 23:13:24 -0800 Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> > With maple_tree supporting vma tree traversal under RCU and per-vma locks
> > making vma access RCU-safe, /proc/pid/maps can be read under RCU and
> > without the need to read-lock mmap_lock. However vma content can change
> > from under us, therefore we make a copy of the vma and we pin pointer
> > fields used when generating the output (currently only vm_file and
> > anon_name). Afterwards we check for concurrent address space
> > modifications, wait for them to end and retry. That last check is needed
> > to avoid possibility of missing a vma during concurrent maple_tree
> > node replacement, which might report a NULL when a vma is replaced
> > with another one. While we take the mmap_lock for reading during such
> > contention, we do that momentarily only to record new mm_wr_seq counter.
> > This change is designed to reduce mmap_lock contention and prevent a
> > process reading /proc/pid/maps files (often a low priority task, such as
> > monitoring/data collection services) from blocking address space updates.
> >
> > Note that this change has a userspace visible disadvantage: it allows for
> > sub-page data tearing as opposed to the previous mechanism where data
> > tearing could happen only between pages of generated output data.
> > Since current userspace considers data tearing between pages to be
> > acceptable, we assume is will be able to handle sub-page data tearing
> > as well.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
> > ---
> > fs/proc/internal.h | 2 +
> > fs/proc/task_mmu.c | 114 ++++++++++++++++++++++++++++++++++++++++++---
> > 2 files changed, 109 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> > index a71ac5379584..e0247225bb68 100644
> > --- a/fs/proc/internal.h
> > +++ b/fs/proc/internal.h
> > @@ -290,6 +290,8 @@ struct proc_maps_private {
> > struct task_struct *task;
> > struct mm_struct *mm;
> > struct vma_iterator iter;
> > + unsigned long mm_wr_seq;
> > + struct vm_area_struct vma_copy;
> > #ifdef CONFIG_NUMA
> > struct mempolicy *task_mempolicy;
> > #endif
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 3f78ebbb795f..3886d04afc01 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -126,11 +126,96 @@ static void release_task_mempolicy(struct proc_maps_private *priv)
> > }
> > #endif
> >
> > -static struct vm_area_struct *proc_get_vma(struct proc_maps_private *priv,
> > - loff_t *ppos)
> > +#ifdef CONFIG_PER_VMA_LOCK
> > +
> > +static const struct seq_operations proc_pid_maps_op;
> > +/*
> > + * Take VMA snapshot and pin vm_file and anon_name as they are used by
> > + * show_map_vma.
> > + */
> > +static int get_vma_snapshow(struct proc_maps_private *priv, struct vm_area_struct *vma)
> > {
> > + struct vm_area_struct *copy = &priv->vma_copy;
> > + int ret = -EAGAIN;
> > +
> > + memcpy(copy, vma, sizeof(*vma));
> > + if (copy->vm_file && !get_file_rcu(©->vm_file))
> > + goto out;
> > +
> > + if (copy->anon_name && !anon_vma_name_get_rcu(copy))
> > + goto put_file;
>
> From today updated mm-unstable which containing this patch, I'm getting below
> build error when CONFIG_ANON_VMA_NAME is not set. Seems this patch needs to
> handle the case?
Hi SeongJae,
Thanks for reporting! I'll post an updated version fixing this config.
Suren.
>
> .../linux/fs/proc/task_mmu.c: In function ‘get_vma_snapshow’:
> .../linux/fs/proc/task_mmu.c:145:19: error: ‘struct vm_area_struct’ has no member named ‘anon_name’; did you mean ‘anon_vma’?
> 145 | if (copy->anon_name && !anon_vma_name_get_rcu(copy))
> | ^~~~~~~~~
> | anon_vma
> .../linux/fs/proc/task_mmu.c:161:19: error: ‘struct vm_area_struct’ has no member named ‘anon_name’; did you mean ‘anon_vma’?
> 161 | if (copy->anon_name)
> | ^~~~~~~~~
> | anon_vma
> .../linux/fs/proc/task_mmu.c:162:41: error: ‘struct vm_area_struct’ has no member named ‘anon_name’; did you mean ‘anon_vma’?
> 162 | anon_vma_name_put(copy->anon_name);
> | ^~~~~~~~~
> | anon_vma
> .../linux/fs/proc/task_mmu.c: In function ‘put_vma_snapshot’:
> .../linux/fs/proc/task_mmu.c:174:18: error: ‘struct vm_area_struct’ has no member named ‘anon_name’; did you mean ‘anon_vma’?
> 174 | if (vma->anon_name)
> | ^~~~~~~~~
> | anon_vma
> .../linux/fs/proc/task_mmu.c:175:40: error: ‘struct vm_area_struct’ has no member named ‘anon_name’; did you mean ‘anon_vma’?
> 175 | anon_vma_name_put(vma->anon_name);
> | ^~~~~~~~~
> | anon_vma
>
> [...]
>
>
> Thanks,
> SJ
Powered by blists - more mailing lists