[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <671d9bbb-0f19-2710-00ef-47734085dddc@redhat.com>
Date: Tue, 31 Jan 2023 09:47:01 +0100
From: David Hildenbrand <david@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, sparclinux@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
"David S. Miller" <davem@...emloft.net>,
Peter Xu <peterx@...hat.com>, Hev <r@....cc>,
Anatoly Pugachev <matorola@...il.com>,
Raghavendra K T <raghavendra.kt@....com>,
Thorsten Leemhuis <regressions@...mhuis.info>,
Mike Kravetz <mike.kravetz@...cle.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Juergen Gross <jgross@...e.com>
Subject: Re: [PATCH v1] sparc/mm: don't unconditionally set HW writable bit
when setting PTE dirty on 64bit
On 12.12.22 14:02, David Hildenbrand wrote:
> On sparc64, there is no HW modified bit, therefore, SW tracks via a SW
> bit if the PTE is dirty via pte_mkdirty(). However, pte_mkdirty()
> currently also unconditionally sets the HW writable bit, which is wrong.
>
> pte_mkdirty() is not supposed to make a PTE actually writable, unless the
> SW writable bit (pte_write()) indicates that the PTE is not
> write-protected. Fortunately, sparc64 also defines a SW writable bit.
>
> For example, this already turned into a problem in the context of
> THP splitting as documented in commit 624a2c94f5b7 ("Partly revert "mm/thp:
> carry over dirty bit when thp splits on pmd") and might be an issue during
> page migration in mm/migrate.c:remove_migration_pte() as well where we:
> if (folio_test_dirty(folio) && is_migration_entry_dirty(entry))
> pte = pte_mkdirty(pte);
>
> But more general, anything like:
> maybe_mkwrite(pte_mkdirty(pte), vma)
> code is broken on sparc64, because it will unconditionally set the HW
> writable bit even if the SW writable bit is not set.
>
> Simple reproducer that will result in a writable PTE after ptrace
> access, to highlight the problem and as an easy way to verify if it has
> been fixed:
>
> --------------------------------------------------------------------------
> #include <fcntl.h>
> #include <signal.h>
> #include <unistd.h>
> #include <string.h>
> #include <errno.h>
> #include <stdlib.h>
> #include <sys/mman.h>
>
> static void signal_handler(int sig)
> {
> if (sig == SIGSEGV)
> printf("[PASS] SIGSEGV generated\n");
> else
> printf("[FAIL] wrong signal generated\n");
> exit(0);
> }
>
> int main(void)
> {
> size_t pagesize = getpagesize();
> char data = 1;
> off_t offs;
> int mem_fd;
> char *map;
> int ret;
>
> mem_fd = open("/proc/self/mem", O_RDWR);
> if (mem_fd < 0) {
> fprintf(stderr, "open(/proc/self/mem) failed: %d\n", errno);
> return 1;
> }
>
> map = mmap(NULL, pagesize, PROT_READ, MAP_PRIVATE|MAP_ANON, -1 ,0);
> if (map == MAP_FAILED) {
> fprintf(stderr, "mmap() failed: %d\n", errno);
> return 1;
> }
>
> printf("original: %x\n", *map);
>
> /* debug access */
> offs = lseek(mem_fd, (uintptr_t) map, SEEK_SET);
> ret = write(mem_fd, &data, 1);
> if (ret != 1) {
> fprintf(stderr, "pwrite(/proc/self/mem) failed with %d: %d\n", ret, errno);
> return 1;
> }
> if (*map != data) {
> fprintf(stderr, "pwrite(/proc/self/mem) not visible\n");
> return 1;
> }
>
> printf("ptrace: %x\n", *map);
>
> /* Install signal handler. */
> if (signal(SIGSEGV, signal_handler) == SIG_ERR) {
> fprintf(stderr, "signal() failed\n");
> return 1;
> }
>
> /* Ordinary access. */
> *map = 2;
>
> printf("access: %x\n", *map);
>
> printf("[FAIL] SIGSEGV not generated\n");
>
> return 0;
> }
> --------------------------------------------------------------------------
>
> Without this commit (sun4u in QEMU):
> # ./reproducer
> original: 0
> ptrace: 1
> access: 2
> [FAIL] SIGSEGV not generated
>
> Let's fix this by setting the HW writable bit only if both, the SW dirty
> bit and the SW writable bit are set. This matches, for example, how
> s390x handles pte_mkwrite() and pte_mkdirty() -- except, that they have
> to clear the _PAGE_PROTECT bit.
>
> We have to move pte_dirty() and pte_dirty() up. The code patching
> mechanism and handling constants > 22bit is a bit special on sparc64.
>
> With this commit (sun4u in QEMU):
> # ./reproducer
> original: 0
> ptrace: 1
> [PASS] SIGSEGV generated
>
> This handling seems to have been in place forever.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: "David S. Miller" <davem@...emloft.net>
> Cc: Peter Xu <peterx@...hat.com>
> Cc: Hev <r@....cc>
> Cc: Anatoly Pugachev <matorola@...il.com>
> Cc: Raghavendra K T <raghavendra.kt@....com>
> Cc: Thorsten Leemhuis <regressions@...mhuis.info>
> Cc: Mike Kravetz <mike.kravetz@...cle.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> Cc: Juergen Gross <jgross@...e.com>
> Signed-off-by: David Hildenbrand <david@...hat.com>
> ---
Ping
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists