linux-kernel - Re: [PATCH 00/40] Memory allocation profiling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZFIMaflxeHS3uR/A@dhcp22.suse.cz>
Date:   Wed, 3 May 2023 09:25:29 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Suren Baghdasaryan <surenb@...gle.com>
Cc:     akpm@...ux-foundation.org, kent.overstreet@...ux.dev,
        vbabka@...e.cz, hannes@...xchg.org, roman.gushchin@...ux.dev,
        mgorman@...e.de, dave@...olabs.net, willy@...radead.org,
        liam.howlett@...cle.com, corbet@....net, void@...ifault.com,
        peterz@...radead.org, juri.lelli@...hat.com, ldufour@...ux.ibm.com,
        catalin.marinas@....com, will@...nel.org, arnd@...db.de,
        tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
        x86@...nel.org, peterx@...hat.com, david@...hat.com,
        axboe@...nel.dk, mcgrof@...nel.org, masahiroy@...nel.org,
        nathan@...nel.org, dennis@...nel.org, tj@...nel.org,
        muchun.song@...ux.dev, rppt@...nel.org, paulmck@...nel.org,
        pasha.tatashin@...een.com, yosryahmed@...gle.com,
        yuzhao@...gle.com, dhowells@...hat.com, hughd@...gle.com,
        andreyknvl@...il.com, keescook@...omium.org,
        ndesaulniers@...gle.com, gregkh@...uxfoundation.org,
        ebiggers@...gle.com, ytcoode@...il.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        bristot@...hat.com, vschneid@...hat.com, cl@...ux.com,
        penberg@...nel.org, iamjoonsoo.kim@....com, 42.hyeyoo@...il.com,
        glider@...gle.com, elver@...gle.com, dvyukov@...gle.com,
        shakeelb@...gle.com, songmuchun@...edance.com, jbaron@...mai.com,
        rientjes@...gle.com, minchan@...gle.com, kaleshsingh@...gle.com,
        kernel-team@...roid.com, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
        linux-arch@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, linux-modules@...r.kernel.org,
        kasan-dev@...glegroups.com, cgroups@...r.kernel.org
Subject: Re: [PATCH 00/40] Memory allocation profiling

On Mon 01-05-23 09:54:10, Suren Baghdasaryan wrote:
> Memory allocation profiling infrastructure provides a low overhead
> mechanism to make all kernel allocations in the system visible. It can be
> used to monitor memory usage, track memory hotspots, detect memory leaks,
> identify memory regressions.
> 
> To keep the overhead to the minimum, we record only allocation sizes for
> every allocation in the codebase. With that information, if users are
> interested in more detailed context for a specific allocation, they can
> enable in-depth context tracking, which includes capturing the pid, tgid,
> task name, allocation size, timestamp and call stack for every allocation
> at the specified code location.
[...]
> Implementation utilizes a more generic concept of code tagging, introduced
> as part of this patchset. Code tag is a structure identifying a specific
> location in the source code which is generated at compile time and can be
> embedded in an application-specific structure. A number of applications
> for code tagging have been presented in the original RFC [1].
> Code tagging uses the old trick of "define a special elf section for
> objects of a given type so that we can iterate over them at runtime" and
> creates a proper library for it. 
> 
> To profile memory allocations, we instrument page, slab and percpu
> allocators to record total memory allocated in the associated code tag at
> every allocation in the codebase. Every time an allocation is performed by
> an instrumented allocator, the code tag at that location increments its
> counter by allocation size. Every time the memory is freed the counter is
> decremented. To decrement the counter upon freeing, allocated object needs
> a reference to its code tag. Page allocators use page_ext to record this
> reference while slab allocators use memcg_data (renamed into more generic
> slabobj_ext) of the slab page.
[...]
> [1] https://lore.kernel.org/all/20220830214919.53220-1-surenb@google.com/
[...]
>  70 files changed, 2765 insertions(+), 554 deletions(-)

Sorry for cutting the cover considerably but I believe I have quoted the
most important/interesting parts here. The approach is not fundamentally
different from the previous version [1] and there was a significant
discussion around this approach. The cover letter doesn't summarize nor
deal with concerns expressed previous AFAICS. So let me bring those up
back. At least those I find the most important:
- This is a big change and it adds a significant maintenance burden
  because each allocation entry point needs to be handled specifically.
  The cost will grow with the intended coverage especially there when
  allocation is hidden in a library code.
- It has been brought up that this is duplicating functionality already
  available via existing tracing infrastructure. You should make it very
  clear why that is not suitable for the job
- We already have page_owner infrastructure that provides allocation
  tracking data. Why it cannot be used/extended?

Thanks!
-- 
Michal Hocko
SUSE Labs