lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <6145cabf-d016-4dba-b5d2-0fb793352058@app.fastmail.com>
Date:   Tue, 20 Jun 2023 10:24:29 -0700
From:   "Andy Lutomirski" <luto@...nel.org>
To:     "Nadav Amit" <nadav.amit@...il.com>, "Song Liu" <song@...nel.org>
Cc:     "Mike Rapoport" <rppt@...nel.org>,
        "Mark Rutland" <mark.rutland@....com>,
        "Kees Cook" <keescook@...omium.org>,
        "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
        "Andrew Morton" <akpm@...ux-foundation.org>,
        "Catalin Marinas" <catalin.marinas@....com>,
        "Christophe Leroy" <christophe.leroy@...roup.eu>,
        "David S. Miller" <davem@...emloft.net>,
        "Dinh Nguyen" <dinguyen@...nel.org>,
        "Heiko Carstens" <hca@...ux.ibm.com>,
        "Helge Deller" <deller@....de>,
        "Huacai Chen" <chenhuacai@...nel.org>,
        "Kent Overstreet" <kent.overstreet@...ux.dev>,
        "Luis Chamberlain" <mcgrof@...nel.org>,
        "Michael Ellerman" <mpe@...erman.id.au>,
        "Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>,
        "Palmer Dabbelt" <palmer@...belt.com>,
        "Puranjay Mohan" <puranjay12@...il.com>,
        "Rick P Edgecombe" <rick.p.edgecombe@...el.com>,
        "Russell King (Oracle)" <linux@...linux.org.uk>,
        "Steven Rostedt" <rostedt@...dmis.org>,
        "Thomas Bogendoerfer" <tsbogend@...ha.franken.de>,
        "Thomas Gleixner" <tglx@...utronix.de>,
        "Will Deacon" <will@...nel.org>, bpf <bpf@...r.kernel.org>,
        linux-arm-kernel@...ts.infradead.org, linux-mips@...r.kernel.org,
        linux-mm <linux-mm@...ck.org>, linux-modules@...r.kernel.org,
        linux-parisc@...r.kernel.org, linux-riscv@...ts.infradead.org,
        linux-s390 <linux-s390@...r.kernel.org>,
        linux-trace-kernel@...r.kernel.org,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        loongarch@...ts.linux.dev, netdev@...r.kernel.org,
        sparclinux@...r.kernel.org,
        "the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: [PATCH v2 02/12] mm: introduce execmem_text_alloc() and jit_text_alloc()



On Mon, Jun 19, 2023, at 1:18 PM, Nadav Amit wrote:
>> On Jun 19, 2023, at 10:09 AM, Andy Lutomirski <luto@...nel.org> wrote:
>> 
>> But jit_text_alloc() can't do this, because the order of operations doesn't match.  With jit_text_alloc(), the executable mapping shows up before the text is populated, so there is no atomic change from not-there to populated-and-executable.  Which means that there is an opportunity for CPUs, speculatively or otherwise, to start filling various caches with intermediate states of the text, which means that various architectures (even x86!) may need serialization.
>> 
>> For eBPF- and module- like use cases, where JITting/code gen is quite coarse-grained, perhaps something vaguely like:
>> 
>> jit_text_alloc() -> returns a handle and an executable virtual address, but does *not* map it there
>> jit_text_write() -> write to that handle
>> jit_text_map() -> map it and synchronize if needed (no sync needed on x86, I think)
>
> Andy, would you mind explaining why you think a sync is not needed? I 
> mean I have a “feeling” that perhaps TSO can guarantee something based 
> on the order of write and page-table update. Is that the argument?

Sorry, when I say "no sync" I mean no cross-CPU synchronization.  I'm assuming the underlying sequence of events is:

allocate physical pages (jit_text_alloc)

write to them (with MOV, memcpy, whatever), via the direct map or via a temporary mm

do an appropriate *local* barrier (which, on x86, is probably implied by TSO, as the subsequent pagetable change is at least a release; also, any any previous temporary mm stuff would have done MOV CR3 afterwards, which is a full "serializing" barrier)

optionally zap the direct map via IPI, assuming the pages are direct mapped (but this could be avoided with a smart enough allocator and temporary_mm above)

install the final RX PTE (jit_text_map), which does a MOV or maybe a LOCK CMPXCHG16B.  Note that the virtual address in question was not readable or executable before this, and all CPUs have serialized since the last time it was executable.

either jump to the new text locally, or:

1. Do a store-release to tell other CPUs that the text is mapped
2. Other CPU does a load-acquire to detect that the text is mapped and jumps to the text

This is all approximately the same thing that plain old mmap(..., PROT_EXEC, ...) does.

>
> On this regard, one thing that I clearly do not understand is why 
> *today* it is ok for users of bpf_arch_text_copy() not to call 
> text_poke_sync(). Am I missing something?

I cannot explain this, because I suspect the current code is wrong.  But it's only wrong across CPUs, because bpf_arch_text_copy goes through text_poke_copy, which calls unuse_temporary_mm(), which is serializing.  And it's plausible that most eBPF use cases don't actually cause the loaded program to get used on a different CPU without first serializing on the CPU that ends up using it.  (Context switches and interrupts are serializing.)

FRED could make interrupts non-serializing. I sincerely hope that FRED doesn't cause this all to fall apart.

--Andy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ