lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7a6de15a-a570-31f2-14d6-a8010296e694@citrix.com>
Date:   Sun, 31 Jan 2021 12:10:49 +0000
From:   Andrew Cooper <andrew.cooper3@...rix.com>
To:     Andy Lutomirski <luto@...nel.org>,
        Nadav Amit <nadav.amit@...il.com>
CC:     Linux-MM <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
        Nadav Amit <namit@...are.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
        Nick Piggin <npiggin@...il.com>, X86 ML <x86@...nel.org>,
        Andy Lutomirski <luto@...capital.net>
Subject: Re: [RFC 03/20] mm/mprotect: do not flush on permission promotion

On 31/01/2021 01:07, Andy Lutomirski wrote:
> Adding Andrew Cooper, who has a distressingly extensive understanding
> of the x86 PTE magic.

Pretty sure it is all learning things the hard way...

> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.amit@...il.com> wrote:
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 632d5a677d3f..b7473d2c9a1f 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -139,7 +139,8 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
>>                                 ptent = pte_mkwrite(ptent);
>>                         }
>>                         ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
>> -                       tlb_flush_pte_range(tlb, addr, PAGE_SIZE);
>> +                       if (pte_may_need_flush(oldpte, ptent))
>> +                               tlb_flush_pte_range(tlb, addr, PAGE_SIZE);

You're choosing to avoid the flush, based on A/D bits read ahead of the
actual modification of the PTE.

In this example, another thread can write into the range (sets A and D),
and get a suitable TLB entry which goes unflushed while the rest of the
kernel thinks the memory is write-protected and clean.

The only safe way to do this is to use XCHG/etc to modify the PTE, and
base flush calculations on the results.  Atomic operations are ordered
with A/D updates from pagewalks on other CPUs, even on AMD where A
updates are explicitly not ordered with regular memory reads, for
performance reasons.

~Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ