lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABb0KFFaYZG62TS+iM3Y92+hDyB35XR8dTX-5hDgWrXCcDQx7Q@mail.gmail.com>
Date:   Mon, 7 Nov 2022 13:26:59 +0100
From:   Michał Mirosław <emmir@...gle.com>
To:     Muhammad Usama Anjum <usama.anjum@...labora.com>
Cc:     Andrei Vagin <avagin@...il.com>,
        Danylo Mocherniuk <mdanylo@...gle.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Christian Brauner <brauner@...nel.org>,
        Peter Xu <peterx@...hat.com>, Yang Shi <shy828301@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Zach O'Keefe" <zokeefe@...gle.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        "Gustavo A. R. Silva" <gustavoars@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>, kernel@...labora.com,
        Gabriel Krisman Bertazi <krisman@...labora.com>,
        David Hildenbrand <david@...hat.com>,
        Peter Enderborg <peter.enderborg@...y.com>,
        "open list : KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>, Shuah Khan <shuah@...nel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "open list : PROC FILESYSTEM" <linux-fsdevel@...r.kernel.org>,
        "open list : MEMORY MANAGEMENT" <linux-mm@...ck.org>
Subject: Re: [PATCH v5 2/3] fs/proc/task_mmu: Implement IOCTL to get and/or
 the clear info about PTEs

On Thu, 3 Nov 2022 at 15:54, Muhammad Usama Anjum
<usama.anjum@...labora.com> wrote:
> This IOCTL, PAGEMAP_SCAN can be used to get and/or clear the info about
> page table entries. The following operations are supported in this ioctl:
> - Get the information if the pages are soft-dirty, file mapped, present
>   or swapped.
> - Clear the soft-dirty PTE bit of the pages.
> - Get and clear the soft-dirty PTE bit of the pages.
>
> Only the soft-dirty bit can be read and cleared atomically. struct
> pagemap_sd_args is used as the argument of the IOCTL. In this struct:
> - The range is specified through start and len.
> - The output buffer and size is specified as vec and vec_len.
> - The optional maximum requested pages are specified in the max_pages.
> - The flags can be specified in the flags field. The PAGEMAP_SD_CLEAR
>   and PAGEMAP_SD_NO_REUSED_REGIONS are supported.
> - The masks are specified in rmask, amask, emask and return_mask.
[...]
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -305,4 +305,57 @@ typedef int __bitwise __kernel_rwf_t;
>  #define RWF_SUPPORTED  (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
>                          RWF_APPEND)
>
> +/* PAGEMAP IOCTL */
> +#define PAGEMAP_SCAN   _IOWR('f', 16, struct pagemap_scan_arg)
> +
> +/* Bits are set in the bitmap of the page_region and masks in pagemap_sd_args */
> +#define PAGE_IS_SD     (1 << 0)

Can we name it PAGE_IS_SOFTDIRTY? "SD" can mean so many things.

> +#define PAGE_IS_FILE   (1 << 1)
> +#define PAGE_IS_PRESENT        (1 << 2)
> +#define PAGE_IS_SWAPED (1 << 3)

PAGE_IS_SWAPPED?

> +
> +/*
> + * struct page_region - Page region with bitmap flags
> + * @start:     Start of the region
> + * @len:       Length of the region
> + * bitmap:     Bits sets for the region
> + */
> +struct page_region {
> +       __u64 start;
> +       __u64 len;
> +       __u32 bitmap;
> +       __u32 __reserved;

"u64 flags"? If an extension is needed it would already require a new
ioctl or something in the `arg` struct.

> +
> +/*
> + * struct pagemap_scan_arg - Soft-dirty IOCTL argument

Since this is no longer a soft-dirty-specific call, it might be better
to describe it as "VM scan ioctl" or similar. BTW, the implementation
is currently guarded by CONFIG_MEM_SOFT_DIRTY, but CRIU doesn't need
that but needs the other bits handling.

> + * @start:             Starting address of the region
> + * @len:               Length of the region (All the pages in this length are included)
> + * @vec:               Address of page_region struct array for output
> + * @vec_len:           Length of the page_region struct array
> + * @max_pages:         Optional max return pages (It must be less than vec_len if specified)

I think we discussed that this is not counting the same things as
vec_len, so there should not be a reference between the two. The limit
is whatever fits under both conditions (IOW: n_vecs <= vec_len &&
(!max_pages || n_pages <= max_pages).

> + * @flags:             Special flags for the IOCTL

Just "Flags for the IOCTL".

> + * @rmask:             Required mask - All of these bits have to be set in the PTE
> + * @amask:             Any mask - Any of these bits are set in the PTE
> + * @emask:             Exclude mask - None of these bits are set in the PTE

It might be easier for developers if those were named e.g.
"required_mask", "anyof_mask", "excluded_mask".

> + * @return_mask:       Bits that have to be reported to the user in page_region

"Bits that are to be reported in page_region"?

> + */
> +struct pagemap_scan_arg {
> +       __u64 start;
> +       __u64 len;
> +       __u64 vec;
> +       __u64 vec_len;
> +       __u32 max_pages;
> +       __u32 flags;
> +       __u32 rmask;
> +       __u32 amask;
> +       __u32 emask;
> +       __u32 return_mask;
> +};
> +
> +/* Special flags */
> +#define PAGEMAP_SD_CLEAR               (1 << 0)

SD -> SOFTDIRTY

> +/* Check the individual pages if they are soft-dirty to find dirty pages faster. */
> +#define PAGEMAP_NO_REUSED_REGIONS      (1 << 1)

Please include the description from commitmsg of what this flag does
(i.e. how the behaviour differs because of the flag). I'd drop the
part about it being faster, as if so - why have the flag at all
instead of just always using the faster way?

(I only reviewed the API now. The implementation I think could be
simpler, but let's leave that to after the API is agreed on.)

Best Regards
Michał Mirosław

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ