linux-kernel - Re: [PATCH 00/40] Memory allocation profiling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZFKikp0Poqen1kNv@slm.duckdns.org>
Date:   Wed, 3 May 2023 08:06:10 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Suren Baghdasaryan <surenb@...gle.com>
Cc:     Kent Overstreet <kent.overstreet@...ux.dev>,
        Michal Hocko <mhocko@...e.com>, akpm@...ux-foundation.org,
        vbabka@...e.cz, hannes@...xchg.org, roman.gushchin@...ux.dev,
        mgorman@...e.de, dave@...olabs.net, willy@...radead.org,
        liam.howlett@...cle.com, corbet@....net, void@...ifault.com,
        peterz@...radead.org, juri.lelli@...hat.com, ldufour@...ux.ibm.com,
        catalin.marinas@....com, will@...nel.org, arnd@...db.de,
        tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
        x86@...nel.org, peterx@...hat.com, david@...hat.com,
        axboe@...nel.dk, mcgrof@...nel.org, masahiroy@...nel.org,
        nathan@...nel.org, dennis@...nel.org, muchun.song@...ux.dev,
        rppt@...nel.org, paulmck@...nel.org, pasha.tatashin@...een.com,
        yosryahmed@...gle.com, yuzhao@...gle.com, dhowells@...hat.com,
        hughd@...gle.com, andreyknvl@...il.com, keescook@...omium.org,
        ndesaulniers@...gle.com, gregkh@...uxfoundation.org,
        ebiggers@...gle.com, ytcoode@...il.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        bristot@...hat.com, vschneid@...hat.com, cl@...ux.com,
        penberg@...nel.org, iamjoonsoo.kim@....com, 42.hyeyoo@...il.com,
        glider@...gle.com, elver@...gle.com, dvyukov@...gle.com,
        shakeelb@...gle.com, songmuchun@...edance.com, jbaron@...mai.com,
        rientjes@...gle.com, minchan@...gle.com, kaleshsingh@...gle.com,
        kernel-team@...roid.com, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
        linux-arch@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, linux-modules@...r.kernel.org,
        kasan-dev@...glegroups.com, cgroups@...r.kernel.org,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>
Subject: Re: [PATCH 00/40] Memory allocation profiling

Hello, Suren.

On Wed, May 03, 2023 at 10:42:11AM -0700, Suren Baghdasaryan wrote:
> > * The framework doesn't really have any runtime overhead, so we can have it
> >   deployed in the entire fleet and debug wherever problem is.
> 
> Do you mean it has no runtime overhead when disabled?

Yes, that's what I meant.

> If so, do you know what's the overhead when enabled? I want to
> understand if that's truly a viable solution to track all allocations
> (including slab) all the time.

(cc'ing Alexei and Andrii who know a lot better than me)

I don't have enough concrete benchmark data on the hand to answer
definitively but hopefully what my general impresison would help. We attach
BPF programs to both per-packet and per-IO paths. They obviously aren't free
but their overhead isn't signficantly higher than building in the same thing
in C code. Once loaded, BPF progs are jit compiled into native code. The
generated code will be a bit worse than regularly compiled C code but those
are really micro differences. There's some bridging code to jump into BPF
but again negligible / acceptable even in the hottest paths.

In terms of execution overhead, I don't think there is a signficant
disadvantage to doing these things in BPF. Bigger differences would likely
be in tracking data structures and locking around them. One can definitely
better integrate tracking into alloc / free paths piggybacking on existing
locking and whatnot. That said, BPF hashtable is pretty fast and BPF is
constantly improving in terms of data structure support.

It really depends on the workload and how much overhead one considers
acceptable and I'm sure persistent global tracking can be done more
efficiently with built-in C code. That said, done right, the overhead
difference most likely isn't gonna be orders of magnitude but more like in
the realm of tens of percents, if that.

So, it doesn't nullify the benefits a dedicated mechansim can bring but does
change the conversation quite a bit. Is the extra code justifiable given
that most of what it enables is already possible using a more generic
mechanism, albeit at a bit higher cost? That may well be the case but it
does raise the bar.

Thanks.

-- 
tejun