lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABb0KFHWnbrf2ythvO0OKsd1ZS9b4D9BNzwBCbn6g9OX4n6ZOg@mail.gmail.com>
Date:   Wed, 14 Jun 2023 00:36:24 +0200
From:   Michał Mirosław <emmir@...gle.com>
To:     Muhammad Usama Anjum <usama.anjum@...labora.com>
Cc:     Peter Xu <peterx@...hat.com>, David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrei Vagin <avagin@...il.com>,
        Danylo Mocherniuk <mdanylo@...gle.com>,
        Paul Gofman <pgofman@...eweavers.com>,
        Cyrill Gorcunov <gorcunov@...il.com>,
        Mike Rapoport <rppt@...nel.org>, Nadav Amit <namit@...are.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Shuah Khan <shuah@...nel.org>,
        Christian Brauner <brauner@...nel.org>,
        Yang Shi <shy828301@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Yun Zhou <yun.zhou@...driver.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Alex Sierra <alex.sierra@....com>,
        Matthew Wilcox <willy@...radead.org>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        "Gustavo A . R . Silva" <gustavoars@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, linux-kselftest@...r.kernel.org,
        Greg KH <gregkh@...uxfoundation.org>, kernel@...labora.com
Subject: Re: [PATCH v18 2/5] fs/proc/task_mmu: Implement IOCTL to get and
 optionally clear info about PTEs

On Tue, 13 Jun 2023 at 12:29, Muhammad Usama Anjum
<usama.anjum@...labora.com> wrote:
>
> This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
> the info about page table entries. The following operations are supported
> in this ioctl:
> - Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
>   file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
>   (PAGE_IS_SWAPPED).
> - Find pages which have been written-to and/or write protect the pages
>   (atomic PM_SCAN_OP_GET + PM_SCAN_OP_WP)
>
> This IOCTL can be extended to get information about more PTE bits.
[...]
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
[...]
> +static int pagemap_scan_output(bool wt, bool file, bool pres, bool swap,
> +                              struct pagemap_scan_private *p,
> +                              unsigned long addr, unsigned int n_pages)
> +{
> +       unsigned long bitmap = PM_SCAN_BITMAP(wt, file, pres, swap);
> +       struct page_region *cur_buf = &p->cur_buf;

Maybe we can go a step further and say here `cur_buf =
&p->vec_buf[p->vec_buf_index];` and remove `p->cur_buf` entirely?

> +
> +       if (!n_pages)
> +               return -EINVAL;
> +
> +       if ((p->required_mask & bitmap) != p->required_mask)
> +               return 0;
> +       if (p->anyof_mask && !(p->anyof_mask & bitmap))
> +               return 0;
> +       if (p->excluded_mask & bitmap)
> +               return 0;
> +
> +       bitmap &= p->return_mask;
> +       if (!bitmap)
> +               return 0;
> +
> +       if (cur_buf->bitmap == bitmap &&
> +           cur_buf->start + cur_buf->len * PAGE_SIZE == addr) {
> +               cur_buf->len += n_pages;
> +               p->found_pages += n_pages;
> +       } else {
> +               if (cur_buf->len && p->vec_buf_index >= p->vec_buf_len)
> +                       return -ENOMEM;

Shouldn't this be -ENOSPC? -ENOMEM usually signifies that the kernel
ran out of memory when allocating, not that there is no space in a
user-provided buffer.

BTW, the check could be inside the if() below for easier reading and
less redundancy.

> +               if (cur_buf->len) {
> +                       memcpy(&p->vec_buf[p->vec_buf_index], cur_buf,
> +                              sizeof(*p->vec_buf));
> +                       p->vec_buf_index++;
> +               }
> +
> +               cur_buf->start = addr;
> +               cur_buf->len = n_pages;
> +               cur_buf->bitmap = bitmap;
> +               p->found_pages += n_pages;
> +       }
> +
> +       if (p->max_pages && (p->found_pages == p->max_pages))

Since `p->found_pages > 0` holds here, the first check is redundant.
Nit: the parentheses around == are not needed.

> +               return PM_SCAN_FOUND_MAX_PAGES;
> +
> +       return 0;
> +}
[...]
> +static inline unsigned long pagemap_scan_check_page_written(struct pagemap_scan_private *p)
> +{
> +       return ((p->required_mask | p->anyof_mask | p->excluded_mask) &
> +               PAGE_IS_WRITTEN) ? PM_SCAN_OP_WRITE : 0;
> +}

Please inline at the single callsite.

For flags name: PM_REQUIRE_WRITE_ACCESS?
Or Is it intended to be checked only if doing WP (as the current name
suggests) and so it would be redundant as WP currently requires
`p->required_mask = PAGE_IS_WRITTEN`?

> +static int pagemap_scan_args_valid(struct pm_scan_arg *arg, unsigned long start,
> +                                  struct page_region __user *vec)
> +{
> +       /* Detect illegal size, flags, len and masks */
> +       if (arg->size != sizeof(struct pm_scan_arg))
> +               return -EINVAL;
> +       if (arg->flags & ~PM_SCAN_OPS)
> +               return -EINVAL;
> +       if (!arg->len)
> +               return -EINVAL;
> +       if ((arg->required_mask | arg->anyof_mask | arg->excluded_mask |
> +            arg->return_mask) & ~PM_SCAN_BITS_ALL)
> +               return -EINVAL;
> +       if (!arg->required_mask && !arg->anyof_mask &&
> +           !arg->excluded_mask)
> +               return -EINVAL;
> +       if (!arg->return_mask)
> +               return -EINVAL;

I just noticed that `!arg->return_mask == !IS_PM_SCAN_GET()`. So the
OP_GET is redundant and can be removed from the `flags` if checking
`return_mask` instead. That way there won't be a "flags-encoded ops"
thing as it would be a 'scan' with optional 'write-protect'.

> +
> +       /* Validate memory range */
> +       if (!IS_ALIGNED(start, PAGE_SIZE))
> +               return -EINVAL;
> +       if (!access_ok((void __user *)start, arg->len))
> +               return -EFAULT;
> +
> +       if (IS_PM_SCAN_GET(arg->flags)) {
> +               if (!arg->vec)
> +                       return -EINVAL;

access_ok() -> -EFAULT (an early fail-check) (the vec_len should be
checked first then, failing with -EINVAL if 0)

> +               if (arg->vec_len == 0)
> +                       return -EINVAL;
> +       }
> +
> +       if (IS_PM_SCAN_WP(arg->flags)) {
> +               if (!IS_PM_SCAN_GET(arg->flags) && arg->max_pages)
> +                       return -EINVAL;
> +
> +               if ((arg->required_mask | arg->anyof_mask | arg->excluded_mask) &
> +                   ~PAGE_IS_WRITTEN)

Is `excluded_mask = PAGE_IS_WRITTEN` intended to be allowed? It would
make WP do nothing, unless the required/anyof/excluded masks are not
supposed to limit WP?


> +                       return -EINVAL;
> +       }

If `flags == 0` (and `return_mask == 0` in case my earlier comment is
applied) it should fail with -EINVAL.

[...]
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> +/*
> + * struct page_region - Page region with bitmap flags
> + * @start:     Start of the region
> + * @len:       Length of the region in pages
> + * bitmap:     Bits sets for the region

'@' is missing for the third field. BTW, maybe we can call it
something like `flags` or `category` (something that hints at the
meaning of the value instead of its data representation).

> +/*
> + * struct pm_scan_arg - Pagemap ioctl argument
> + * @size:              Size of the structure
> + * @flags:             Flags for the IOCTL
> + * @start:             Starting address of the region
> + * @len:               Length of the region (All the pages in this length are included)

Maybe `scan_start`, `scan_len` - so that there is a better distinction
from the structure's `size` field?

> + * @vec:               Address of page_region struct array for output
> + * @vec_len:           Length of the page_region struct array
> + * @max_pages:         Optional max return pages
> + * @required_mask:     Required mask - All of these bits have to be set in the PTE
> + * @anyof_mask:                Any mask - Any of these bits are set in the PTE
> + * @excluded_mask:     Exclude mask - None of these bits are set in the PTE
> + * @return_mask:       Bits that are to be reported in page_region
> + */

I skipped most of the page walk implementation as maybe the comments
above could make it simpler. Reading this patch and the documentation
I still feel confused about how the filtering/limiting parameters
should affect GET, WP and WP+GET. Should they limit the pages walked
(and WP-ed)? Or only the GET's output? How about GET+WP case?

Best Regards

Michał Mirosław

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ