[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpEuLjd+FJ7MQQ+y=ghVnYQP-WDcXxLCcy07JQ0VFweLEg@mail.gmail.com>
Date: Wed, 31 Aug 2022 09:48:17 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Mel Gorman <mgorman@...e.de>,
Kent Overstreet <kent.overstreet@...ux.dev>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
Johannes Weiner <hannes@...xchg.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Davidlohr Bueso <dave@...olabs.net>,
Matthew Wilcox <willy@...radead.org>,
"Liam R. Howlett" <liam.howlett@...cle.com>,
David Vernet <void@...ifault.com>,
Juri Lelli <juri.lelli@...hat.com>,
Laurent Dufour <ldufour@...ux.ibm.com>,
Peter Xu <peterx@...hat.com>,
David Hildenbrand <david@...hat.com>,
Jens Axboe <axboe@...nel.dk>, mcgrof@...nel.org,
masahiroy@...nel.org, nathan@...nel.org, changbin.du@...el.com,
ytcoode@...il.com, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Benjamin Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Christopher Lameter <cl@...ux.com>,
Pekka Enberg <penberg@...nel.org>,
Joonsoo Kim <iamjoonsoo.kim@....com>, 42.hyeyoo@...il.com,
Alexander Potapenko <glider@...gle.com>,
Marco Elver <elver@...gle.com>, dvyukov@...gle.com,
Shakeel Butt <shakeelb@...gle.com>,
Muchun Song <songmuchun@...edance.com>, arnd@...db.de,
jbaron@...mai.com, David Rientjes <rientjes@...gle.com>,
Minchan Kim <minchan@...gle.com>,
Kalesh Singh <kaleshsingh@...gle.com>,
kernel-team <kernel-team@...roid.com>,
linux-mm <linux-mm@...ck.org>, iommu@...ts.linux.dev,
kasan-dev@...glegroups.com, io-uring@...r.kernel.org,
linux-arch@...r.kernel.org, xen-devel@...ts.xenproject.org,
linux-bcache@...r.kernel.org, linux-modules@...r.kernel.org,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 00/30] Code tagging framework and applications
On Wed, Aug 31, 2022 at 8:28 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Wed, Aug 31, 2022 at 3:47 AM Michal Hocko <mhocko@...e.com> wrote:
> >
> > On Wed 31-08-22 11:19:48, Mel Gorman wrote:
> > > On Wed, Aug 31, 2022 at 04:42:30AM -0400, Kent Overstreet wrote:
> > > > On Wed, Aug 31, 2022 at 09:38:27AM +0200, Peter Zijlstra wrote:
> > > > > On Tue, Aug 30, 2022 at 02:48:49PM -0700, Suren Baghdasaryan wrote:
> > > > > > ===========================
> > > > > > Code tagging framework
> > > > > > ===========================
> > > > > > Code tag is a structure identifying a specific location in the source code
> > > > > > which is generated at compile time and can be embedded in an application-
> > > > > > specific structure. Several applications of code tagging are included in
> > > > > > this RFC, such as memory allocation tracking, dynamic fault injection,
> > > > > > latency tracking and improved error code reporting.
> > > > > > Basically, it takes the old trick of "define a special elf section for
> > > > > > objects of a given type so that we can iterate over them at runtime" and
> > > > > > creates a proper library for it.
> > > > >
> > > > > I might be super dense this morning, but what!? I've skimmed through the
> > > > > set and I don't think I get it.
> > > > >
> > > > > What does this provide that ftrace/kprobes don't already allow?
> > > >
> > > > You're kidding, right?
> > >
> > > It's a valid question. From the description, it main addition that would
> > > be hard to do with ftrace or probes is catching where an error code is
> > > returned. A secondary addition would be catching all historical state and
> > > not just state since the tracing started.
> > >
> > > It's also unclear *who* would enable this. It looks like it would mostly
> > > have value during the development stage of an embedded platform to track
> > > kernel memory usage on a per-application basis in an environment where it
> > > may be difficult to setup tracing and tracking. Would it ever be enabled
> > > in production? Would a distribution ever enable this? If it's enabled, any
> > > overhead cannot be disabled/enabled at run or boot time so anyone enabling
> > > this would carry the cost without never necessarily consuming the data.
>
> Thank you for the question.
> For memory tracking my intent is to have a mechanism that can be enabled in
> the field testing (pre-production testing on a large population of
> internal users).
> The issue that we are often facing is when some memory leaks are happening
> in the field but very hard to reproduce locally. We get a bugreport
> from the user
> which indicates it but often has not enough information to track it. Note that
> quite often these leaks/issues happen in the drivers, so even simply finding out
> where they came from is a big help.
> The way I envision this mechanism to be used is to enable the basic memory
> tracking in the field tests and have a user space process collecting
> the allocation
> statistics periodically (say once an hour). Once it detects some counter growing
> infinitely or atypically (the definition of this is left to the user
> space) it can enable
> context capturing only for that specific location, still keeping the
> overhead to the
> minimum but getting more information about potential issues. Collected stats and
> contexts are then attached to the bugreport and we get more visibility
> into the issue
> when we receive it.
> The goal is to provide a mechanism with low enough overhead that it
> can be enabled
> all the time during these field tests without affecting the device's
> performance profiles.
> Tracing is very cheap when it's disabled but having it enabled all the
> time would
> introduce higher overhead than the counter manipulations.
> My apologies, I should have clarified all this in this cover letter
> from the beginning.
>
> As for other applications, maybe I'm not such an advanced user of
> tracing but I think only
> the latency tracking application might be done with tracing, assuming
> we have all the
> right tracepoints but I don't see how we would use tracing for fault
> injections and
> descriptive error codes. Again, I might be mistaken.
Sorry about the formatting of my reply. Forgot to reconfigure the editor on
the new machine.
>
> Thanks,
> Suren.
>
> > >
> > > It might be an ease-of-use thing. Gathering the information from traces
> > > is tricky and would need combining multiple different elements and that
> > > is development effort but not impossible.
> > >
> > > Whatever asking for an explanation as to why equivalent functionality
> > > cannot not be created from ftrace/kprobe/eBPF/whatever is reasonable.
> >
> > Fully agreed and this is especially true for a change this size
> > 77 files changed, 3406 insertions(+), 703 deletions(-)
> >
> > --
> > Michal Hocko
> > SUSE Labs
Powered by blists - more mailing lists