lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fec3f46e-a777-06e7-0ba0-a8cf169afa02@redhat.com>
Date:   Mon, 28 Nov 2022 14:52:57 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Jann Horn <jannh@...gle.com>, security@...nel.org,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Yang Shi <shy828301@...il.com>, Peter Xu <peterx@...hat.com>,
        John Hubbard <jhubbard@...dia.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v3 1/3] mm/khugepaged: Take the right locks for page table
 retraction

On 25.11.22 22:37, Jann Horn wrote:
> pagetable walks on address ranges mapped by VMAs can be done under the mmap
> lock, the lock of an anon_vma attached to the VMA, or the lock of the VMA's
> address_space. Only one of these needs to be held, and it does not need to
> be held in exclusive mode.
> 
> Under those circumstances, the rules for concurrent access to page table
> entries are:
> 
>   - Terminal page table entries (entries that don't point to another page
>     table) can be arbitrarily changed under the page table lock, with the
>     exception that they always need to be consistent for
>     hardware page table walks and lockless_pages_from_mm().
>     This includes that they can be changed into non-terminal entries.
>   - Non-terminal page table entries (which point to another page table)
>     can not be modified; readers are allowed to READ_ONCE() an entry, verify
>     that it is non-terminal, and then assume that its value will stay as-is.
> 
> Retracting a page table involves modifying a non-terminal entry, so
> page-table-level locks are insufficient to protect against concurrent
> page table traversal; it requires taking all the higher-level locks under
> which it is possible to start a page walk in the relevant range in
> exclusive mode.
> 
> The collapse_huge_page() path for anonymous THP already follows this rule,
> but the shmem/file THP path was getting it wrong, making it possible for
> concurrent rmap-based operations to cause corruption.

This sounds sane and correct to me. No expert on file-THP, though.

For anon-THP it's the mmap lock and the rmap locks. I assume the only 
difference for file-THP is that the rmap lock is actually the mapping 
lock. Looking at rmap_walk_file(), that seems to be the case.


I wish at least PTE table removal could be done easier ... I already 
experimented some time ago with some ideas (e.g., lock in PMD table 
memmap) but it's all far from trivial and space in the memmap is rare.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ