[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0d193c2-9b23-6f3a-9208-2a615febe0c2@redhat.com>
Date: Thu, 21 Jul 2022 09:28:47 +0200
From: David Hildenbrand <david@...hat.com>
To: Peter Xu <peterx@...hat.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Cc: Andrea Arcangeli <aarcange@...hat.com>,
Nadav Amit <nadav.amit@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v2] mm/mprotect: Fix soft-dirty check in
can_change_pte_writable()
On 21.07.22 00:03, Peter Xu wrote:
> The check wanted to make sure when soft-dirty tracking is enabled we won't
> grant write bit by accident, as a page fault is needed for dirty tracking.
> The intention is correct but we didn't check it right because VM_SOFTDIRTY
> set actually means soft-dirty tracking disabled. Fix it.
Thanks for digging into this and writing the reproducer. The softdirty
logic was rather confusing for me.
>
> It wasn't a bug for a long time because we used to only optimize the write
> bit settings in change_pte_range() for page caches, and since we've got a
> higher level check in vma_wants_writenotify(), we will never set the bit
> MM_CP_TRY_CHANGE_WRITABLE for soft-dirty enabled page caches, hence even if
> we checked with the wrong value of VM_SOFTDIRTY in change_pte_range() it'll
> just be an no-op. Functionally it was still correct, even if cpu cycles
> wasted.
I don't quite follow that explanation and most probably I am missing
something.
Modifying your test to map page from a file MAP_SHARED gives me under
5.18.11-100.fc35.x86_64:
53,54d52
< FILE *file = fopen("tmpfile", "w+");
< int file_fd;
56d53
< assert(file);
59,61d55
<
< file_fd = fileno(file);
< ftruncate(file_fd, psize);
63c57
< MAP_SHARED, file_fd, 0);
---
> MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
t480s: ~ $ sudo ./tmp
ERROR: Wrote page again, soft-dirty=0 (expect: 1
IMHO, while the check in vma_wants_writenotify() is correct and makes
sure that the pages are kept R/O by the generic machinery. We get
vma_wants_writenotify(), so we activate MM_CP_TRY_CHANGE_WRITABLE. The
wrong logic in can_change_pte_writable(), however, maps the page
writable again without caring about softdirty.
At least that would be my explanation for the failure. But maybe I
messes up something else :)
>
> However after the recent work of anonymous page optimization on exclusive
> pages we'll start to make it wrong because anonymous page does not require
> the check in vma_wants_writenotify() hence it'll suffer from the wrong
> check here in can_change_pte_writable().
>
> We can easily verify this with any exclusive anonymous page, like program
> below:
>
> =======8<======
> #include <stdio.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <assert.h>
> #include <inttypes.h>
> #include <stdint.h>
> #include <sys/types.h>
> #include <sys/mman.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <stdbool.h>
>
> #define BIT_ULL(nr) (1ULL << (nr))
> #define PM_SOFT_DIRTY BIT_ULL(55)
>
> unsigned int psize;
> char *page;
>
> uint64_t pagemap_read_vaddr(int fd, void *vaddr)
> {
> uint64_t value;
> int ret;
>
> ret = pread(fd, &value, sizeof(uint64_t),
> ((uint64_t)vaddr >> 12) * sizeof(uint64_t));
> assert(ret == sizeof(uint64_t));
>
> return value;
> }
>
> void clear_refs_write(void)
> {
> int fd = open("/proc/self/clear_refs", O_RDWR);
>
> assert(fd >= 0);
> write(fd, "4", 2);
> close(fd);
> }
>
> #define check_soft_dirty(str, expect) do { \
> bool dirty = pagemap_read_vaddr(fd, page) & PM_SOFT_DIRTY; \
> if (dirty != expect) { \
> printf("ERROR: %s, soft-dirty=%d (expect: %d)\n", str, dirty, expect); \
> exit(-1); \
> } \
> } while (0)
>
> int main(void)
> {
> int fd = open("/proc/self/pagemap", O_RDONLY);
>
> assert(fd >= 0);
> psize = getpagesize();
> page = mmap(NULL, psize, PROT_READ|PROT_WRITE,
> MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> assert(page != MAP_FAILED);
>
> *page = 1;
> check_soft_dirty("Just faulted in page", 1);
> clear_refs_write();
> check_soft_dirty("Clear_refs written", 0);
> mprotect(page, psize, PROT_READ);
> check_soft_dirty("Marked RO", 0);
> mprotect(page, psize, PROT_READ|PROT_WRITE);
> check_soft_dirty("Marked RW", 0);
> *page = 2;
> check_soft_dirty("Wrote page again", 1);
>
> munmap(page, psize);
> close(fd);
> printf("Test passed.\n");
>
> return 0;
> }
Can we turn that into a vm selftest in
tools/testing/selftests/vm/soft-dirty.c, and also extend it by
MAP_SHARED froma file as above?
> =======8<======
>
> So even if commit 64fe24a3e05e kept the old behavior and didn't attempt to
> change the behavior here, the bug will only be able to be triggered after
> commit 64fe24a3e05e because only anonymous page will suffer from it.
>
> Fixes: 64fe24a3e05e ("mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection")
> Signed-off-by: Peter Xu <peterx@...hat.com>
> ---
> mm/mprotect.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 0420c3ed936c..804807ab14e6 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -48,8 +48,11 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
> if (pte_protnone(pte) || !pte_dirty(pte))
> return false;
>
> - /* Do we need write faults for softdirty tracking? */
> - if ((vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte))
> + /*
> + * Do we need write faults for softdirty tracking? Note,
> + * soft-dirty is enabled when !VM_SOFTDIRTY.
> + */
> + if (!(vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte))
> return false;
I wonder if we now want, just as in vma_wants_writenotify(), an early
check for IS_ENABLED(CONFIG_MEM_SOFT_DIRTY), to optimize this out
completely.
>
> /* Do we need write faults for uffd-wp tracking? */
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists