[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1624005747.j7xp8o9byl.naveen@linux.ibm.com>
Date: Fri, 18 Jun 2021 18:49:06 +0530
From: "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>
To: Masami Hiramatsu <mhiramat@...nel.org>
Cc: Anton Blanchard <anton@...abs.org>, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
Masami Hiramatsu wrote:
>
>> To address this, as a first step, we should probably consider parsing
>> kprobe_profile and printing a warning with 'perf' if we detect a
>> non-zero miss count for a probe -- both a regular probe, as well as a
>> retprobe.
>
> Yeah, it is doable. Note that perf-probe only set up the event and
> perf-trace or other commands will use it.
>
>
>> If we do this, the nice thing with kprobe_profile is that the probe miss
>> count is available, and can serve as a good way to decide what a more
>> reasonable maxactive value should be. This should help prevent users
>> from trying with arbitrary maxactive values.
>
> Such feedback loop is an interesting idea.
> Note that nmissed count is an accumulate value, not the max number of
> the instance which will be needed.
Yes, we will have to factor-in the duration during which the event was
active. This will still be an approximation, but serves as a good
starting point. It may need a few tries to get this right, but more
importantly, the user knows instantly that there are missed probes.
>
>> For perf_event_open(), perhaps we can introduce an ioctl to query the
>> probe miss count.
>
> Or, maybe we can expand the maxactive in runtime. e.g. add a shortage
> counter on the kretprobe, and run a monitor kernel thread (or kworker).
> If the shortage counter is incremented, the monitor allocates instances
> (2x counter) and give it to the kretprobe. And it resets the shortage
> counter. This adaptive maxactive may cause mis-hit in the beginning,
> but finally find the optimal maxactive value automatically.
I like this idea and I have been thinking along these lines too. If we
start with a better default (rather than just num_possible_cpus() used
today), I suspect we may be able to get this to work well enough to not
have to miss any probes. Specifying 'maxactive' can still serve as a
workaround to allocate a larger initial set of kretprobe_instances in
case this doesn't work.
>
>
>> > To avoid such trouble, I had set the 4096 limitation for the maxactive
>> > parameter. Of course 4096 may not enough for some use-cases. I'm
>> > welcome
>> > to expand it (e.g. 32k, isn't it enough?), but removing the limitation
>> > may cause OOM trouble easily.
>>
>> Do you have suggestions for how we can determine a better limit? As you
>> point out in the other email, there could very well be 64k or more
>> processes on a large machine. Since the primary concern is memory usage,
>> we probably need to decide this based on total memory. But, memory usage
>> will vary depending on system load...
>
> This is very good question. IMHO, it might better to calculate the total
> maxactive from the system memory size. For example, 1% of system memory
> can be used for the kretprobes, 16GB system will allow using 160MB for
> kretprobes, which means about "30M" is the max number of maxactive, or
> multiple kretprobes can share it. Doesn't it sound enough? Of course
> this will need to show the current usage of the kretprobe instance objects
> via tracefs or debugfs. But this total cap seems reasonable for me to
> avoid OOM trouble.
>
>> Perhaps we can start by making maxactive limit be a tunable with a
>> default value of 4096, with the understanding that users will be careful
>> when bumping up this value. Hopefully, scripts won't simply start
>> writing into this file ;)
>
> Yeah, that's what I suggested at first, because the best maxactive will
> depend on the max number of the *processes* and the probed function.
>
> If the probed function will NOT be preempted or slept, maxactive will be
> the number of *processor cores*. Or, if it can be preempted or slept, it
> will be the max number of *processes*. If the probed function can
> recursively called (Note: this is rare case), the maxactive has to
> be multiplied.
>
> It is hard to estimate the max number of processes, since it depends
> on the system. Small embedded systems don't run thousands of processes,
> but big servers will run more than ten thousands of processes.
> Thus make it tunable will be a good idea.
Agree.
Thanks,
Naveen
Powered by blists - more mailing lists