netdev - Re: [PATCH bpf-next 2/7] set_memory: introduce set_memory_[ro|x]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YZdv/NLUU9qLHP2g@hirez.programming.kicks-ass.net>
Date:   Fri, 19 Nov 2021 10:35:56 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Song Liu <songliubraving@...com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        bpf <bpf@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "andrii@...nel.org" <andrii@...nel.org>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 2/7] set_memory: introduce
 set_memory_[ro|x]_noalias

On Fri, Nov 19, 2021 at 04:14:46AM +0000, Song Liu wrote:
> 
> 
> > On Nov 18, 2021, at 10:58 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> > 
> > On Thu, Nov 18, 2021 at 06:39:49PM +0000, Song Liu wrote:
> > 
> >>> You're going to have to do that anyway if you're going to write to the
> >>> directmap while executing from the alias.
> >> 
> >> Not really. If you look at current version 7/7, the logic is mostly 
> >> straightforward. We just make all the writes to the directmap, while 
> >> calculate offset from the alias. 
> > 
> > Then you can do the exact same thing but do the writes to a temp buffer,
> > no different.
> 
> There will be some extra work, but I guess I will give it a try. 
> 
> > 
> >>>> The BPF program could have up to 1000000 (BPF_COMPLEXITY_LIMIT_INSNS)
> >>>> instructions (BPF instructions). So it could easily go beyond a few 
> >>>> pages. Mapping the 2MB page all together should make the logic simpler. 
> >>> 
> >>> Then copy it in smaller chunks I suppose.
> >> 
> >> How fast/slow is the __text_poke routine? I guess we cannot do it thousands
> >> of times per BPF program (in chunks of a few bytes)? 
> > 
> > You can copy in at least 4k chunks since any 4k will at most use 2
> > pages, which is what it does. If that's not fast enough we can look at
> > doing bigger chunks.
> 
> If we do JIT in a buffer first, 4kB chunks should be fast enough. 
> 
> Another side of this issue is the split of linear mapping (1GB => 
> many 4kB). If we only split to PMD, but not PTE, we can probably 
> recover most of the regression. I will check this with Johannes. 

__text_poke() shouldn't affect the fragmentation of the kernel
mapping, it's a user-space alias into the same physical memory. For all
it cares we're poking into GB pages.