lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiVLvz3RdZiSjLNGKKgR3s-=2goRPnNWg6cbrcwMVvndQ@mail.gmail.com>
Date:   Mon, 8 May 2023 16:31:09 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "keescook@...omium.org" <keescook@...omium.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Subject: Re: [GIT PULL] x86/shstk for 6.4

On Mon, May 8, 2023 at 3:57 PM Dave Hansen <dave.hansen@...el.com> wrote:
>
> There's a wrinkle to enforcing that universally.  From the SDM's
> "ACCESSED AND DIRTY FLAGS" section:
>
>         If software on one logical processor writes to a page while
>         software on another logical processor concurrently clears the
>         R/W flag in the paging-structure entry that maps the page,
>         execution on some processors may result in the entry’s dirty
>         flag being set.

I was actually wondering about that.

I had this memory that we've done special things in the past to make
sure that the dirty bit is guaranteed stable (ie the whole
"ptep_clear()" dance). But I wasn't sure.

> This behavior is gone on shadow stack CPUs

Ok, so Intel has actually tightened up the rules on setting dirty, and
now guarantees that it will set dirty only if the pte is actually
writable?

> We could probably tolerate the cost for some of the users like ksm.  But
> I can't think of a way to do it without making fork() suffer.  fork() of
> course modifies the PTE (RW->RO) and flushes the TLB now.  But there
> would need to be a Present=0 PTE in there somewhere before the TLB flush.

Yeah, we don't want to make fork() any worse than it already is.  No
question about that.

But if we make the rule be that having the exact dirty bit vs rw bit
semantics only matters for CPUs that do the shadow stack thing, and on
*those* CPU's it's ok to not go through the dance, can we then come up
with a sequence that works for everybody?

> So, the rule would be something like:
>
>         The *kernel* will never itself create Write=0,Dirty=1 PTEs
>
> That won't prevent the hardware from still being able to do it behind
> our backs on older CPUs.  But it does avoid a few of the special cases.

Right. So looking at the fork() case as a nasty example, right now we have

        ptep_set_wrprotect()

on the source pte of a fork(), which atomically just clears the WRITE
bit (and thus guarantees that dirty bits cannot get lost, simply
because it doesn't matter if some other CPU atomically sets another
bit concurrently).

On the destination we don't have any races with concurrent accesses,
and just do entirely non-atomic

                pte = pte_wrprotect(pte);

and then eventually (after other bit games) do

        set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte);

and basically you're saying that there is no possible common sequence
for that ptep_set_wrprotect() that doesn't penalize some case.

Hmm.

Yeah, right now the non-shadow-stack ptep_set_wrprotect() can just be
an atomic clear_bit(), which turns into just

        lock andb $-3, (%reg)

and I guess that would inevitably become a horror of a cmpxchg loop
when you need to move the dirty bit to the SW dirty on CPU's where the
dirty bit can come in late.

How very very horrid.

                     Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ