[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4BzZ9G9aek7u7Tr=307h8eK4v98LGZVsm5Eqz_4jpsEaiag@mail.gmail.com>
Date: Wed, 26 Jun 2024 09:37:52 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>, Andrii Nakryiko <andrii@...nel.org>,
linux-fsdevel@...r.kernel.org, brauner@...nel.org, viro@...iv.linux.org.uk,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
gregkh@...uxfoundation.org, linux-mm@...ck.org, surenb@...gle.com,
rppt@...nel.org, adobriyan@...il.com
Subject: Re: [PATCH v5 2/6] fs/procfs: implement efficient VMA querying API
for /proc/<pid>/maps
On Tue, Jun 25, 2024 at 7:42 PM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
>
> * Andrii Nakryiko <andrii@...nel.org> [240618 18:45]:
> ...
>
> > +
> > +static int do_procmap_query(struct proc_maps_private *priv, void __user *uarg)
> > +{
> > + struct procmap_query karg;
> > + struct vm_area_struct *vma;
> > + struct mm_struct *mm;
> > + const char *name = NULL;
> > + char *name_buf = NULL;
> > + __u64 usize;
> > + int err;
> > +
> > + if (copy_from_user(&usize, (void __user *)uarg, sizeof(usize)))
> > + return -EFAULT;
> > + /* argument struct can never be that large, reject abuse */
> > + if (usize > PAGE_SIZE)
> > + return -E2BIG;
> > + /* argument struct should have at least query_flags and query_addr fields */
> > + if (usize < offsetofend(struct procmap_query, query_addr))
> > + return -EINVAL;
> > + err = copy_struct_from_user(&karg, sizeof(karg), uarg, usize);
> > + if (err)
> > + return err;
> > +
> > + /* reject unknown flags */
> > + if (karg.query_flags & ~PROCMAP_QUERY_VALID_FLAGS_MASK)
> > + return -EINVAL;
> > + /* either both buffer address and size are set, or both should be zero */
> > + if (!!karg.vma_name_size != !!karg.vma_name_addr)
> > + return -EINVAL;
> > +
> > + mm = priv->mm;
> > + if (!mm || !mmget_not_zero(mm))
> > + return -ESRCH;
> > +
> > + err = query_vma_setup(mm);
> > + if (err) {
> > + mmput(mm);
> > + return err;
> > + }
> > +
> > + vma = query_matching_vma(mm, karg.query_addr, karg.query_flags);
> > + if (IS_ERR(vma)) {
> > + err = PTR_ERR(vma);
> > + vma = NULL;
> > + goto out;
> > + }
> > +
> > + karg.vma_start = vma->vm_start;
> > + karg.vma_end = vma->vm_end;
> > +
> > + karg.vma_flags = 0;
> > + if (vma->vm_flags & VM_READ)
> > + karg.vma_flags |= PROCMAP_QUERY_VMA_READABLE;
> > + if (vma->vm_flags & VM_WRITE)
> > + karg.vma_flags |= PROCMAP_QUERY_VMA_WRITABLE;
> > + if (vma->vm_flags & VM_EXEC)
> > + karg.vma_flags |= PROCMAP_QUERY_VMA_EXECUTABLE;
> > + if (vma->vm_flags & VM_MAYSHARE)
> > + karg.vma_flags |= PROCMAP_QUERY_VMA_SHARED;
> > +
> > + karg.vma_page_size = vma_kernel_pagesize(vma);
> > +
> ...
>
> > +/*
> > + * Input/output argument structured passed into ioctl() call. It can be used
> > + * to query a set of VMAs (Virtual Memory Areas) of a process.
> > + *
> > + * Each field can be one of three kinds, marked in a short comment to the
> > + * right of the field:
> > + * - "in", input argument, user has to provide this value, kernel doesn't modify it;
> > + * - "out", output argument, kernel sets this field with VMA data;
> > + * - "in/out", input and output argument; user provides initial value (used
> > + * to specify maximum allowable buffer size), and kernel sets it to actual
> > + * amount of data written (or zero, if there is no data).
> > + *
> > + * If matching VMA is found (according to criterias specified by
> > + * query_addr/query_flags, all the out fields are filled out, and ioctl()
> > + * returns 0. If there is no matching VMA, -ENOENT will be returned.
> > + * In case of any other error, negative error code other than -ENOENT is
> > + * returned.
> > + *
> > + * Most of the data is similar to the one returned as text in /proc/<pid>/maps
> > + * file, but procmap_query provides more querying flexibility. There are no
> > + * consistency guarantees between subsequent ioctl() calls, but data returned
> > + * for matched VMA is self-consistent.
> > + */
> > +struct procmap_query {
> > + /* Query struct size, for backwards/forward compatibility */
> > + __u64 size;
> > + /*
> > + * Query flags, a combination of enum procmap_query_flags values.
> > + * Defines query filtering and behavior, see enum procmap_query_flags.
> > + *
> > + * Input argument, provided by user. Kernel doesn't modify it.
> > + */
> > + __u64 query_flags; /* in */
> > + /*
> > + * Query address. By default, VMA that covers this address will
> > + * be looked up. PROCMAP_QUERY_* flags above modify this default
> > + * behavior further.
> > + *
> > + * Input argument, provided by user. Kernel doesn't modify it.
> > + */
> > + __u64 query_addr; /* in */
> > + /* VMA starting (inclusive) and ending (exclusive) address, if VMA is found. */
> > + __u64 vma_start; /* out */
> > + __u64 vma_end; /* out */
> > + /* VMA permissions flags. A combination of PROCMAP_QUERY_VMA_* flags. */
> > + __u64 vma_flags; /* out */
> > + /* VMA backing page size granularity. */
> > + __u32 vma_page_size; /* out */
>
> The vma_kernel_pagesize() returns an unsigned long. We could
> potentially be truncating the returned value (although probably not
> today?). This is from the vm_operations_struct pagesize, which also
> returns an unsigned long. Could we switch this to __u64?
>
Yep, fair point. It seemed excessive to use u64, but I guess nothing
prevents us (in the future) from having humongous pages with >4GB
sizes. I'll update to __u64.
I'll wait for a few more hours just in case I get some other feedback,
and will rebase and post a new revision later today, thanks.
> ...
>
> Thanks,
> Liam
Powered by blists - more mailing lists