[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEe=SxmG4oUBUu88NNyOhPC5weExf=UCzLy_pzwg3+CruqO4Cw@mail.gmail.com>
Date: Thu, 5 Sep 2019 10:43:28 -0700
From: Tim Murray <timmurray@...gle.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Suren Baghdasaryan <surenb@...gle.com>,
Michal Hocko <mhocko@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>,
LKML <linux-kernel@...r.kernel.org>,
Carmen Jackson <carmenjackson@...gle.com>,
Mayank Gupta <mayankgupta@...gle.com>,
Daniel Colascione <dancol@...gle.com>,
Minchan Kim <minchan@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
kernel-team <kernel-team@...roid.com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
Dan Williams <dan.j.williams@...el.com>,
Jerome Glisse <jglisse@...hat.com>,
linux-mm <linux-mm@...ck.org>,
Matthew Wilcox <willy@...radead.org>,
Ralph Campbell <rcampbell@...dia.com>,
Vlastimil Babka <vbabka@...e.cz>,
Tom Zanussi <zanussi@...nel.org>
Subject: Re: [PATCH v2] mm: emit tracepoint when RSS changes by threshold
On Thu, Sep 5, 2019 at 9:03 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
> I might misunderstand this but is the issue here actually throttling
> of the sheer number of trace records or tracing large enough changes
> to RSS that user might care about? Small changes happen all the time
> but we are likely not interested in those. Surely we could postprocess
> the traces to extract changes large enough to be interesting but why
> capture uninteresting information in the first place? IOW the
> throttling here should be based not on the time between traces but on
> the amount of change of the traced signal. Maybe a generic facility
> like that would be a good idea?
You want two properties from the tracepoint:
- Small fluctuations in the value don't flood the trace buffer. If you
get a new trace event from a process every time kswapd reclaims a
single page from that process, you're going to need an enormous trace
buffer that will have significant side effects on overall system
performance.
- Any spike in memory consumption gets a trace event, regardless of
the duration of that spike. This tracepoint has been incredibly useful
in both understanding the causes of kswapd wakeups and
lowmemorykiller/lmkd kills and evaluating the impact of memory
management changes because it guarantees that any spike appears in the
trace output.
As a result, the RSS tracepoint in particular needs to be throttled
based on the delta of the value, not time. The very first prototype of
the patch emitted a trace event per RSS counter change, and IIRC the
RSS trace events consumed significantly more room in the buffer than
sched_switch (and Android has a lot of sched_switch events). It's not
practical to trace changes in RSS without throttling. If there's a
generic throttling approach that would work here, I'm all for it; like
Dan mentioned, there are many more counters that we would like to
trace in a similar way.
Powered by blists - more mailing lists