linux-kernel - Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <1624005747.j7xp8o9byl.naveen@linux.ibm.com>
Date:   Fri, 18 Jun 2021 18:49:06 +0530
From:   "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>
To:     Masami Hiramatsu <mhiramat@...nel.org>
Cc:     Anton Blanchard <anton@...abs.org>, linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive

Masami Hiramatsu wrote:
> 
>> To address this, as a first step, we should probably consider parsing 
>> kprobe_profile and printing a warning with 'perf' if we detect a 
>> non-zero miss count for a probe -- both a regular probe, as well as a 
>> retprobe.
> 
> Yeah, it is doable. Note that perf-probe only set up the event and
> perf-trace or other commands will use it.
> 
> 
>> If we do this, the nice thing with kprobe_profile is that the probe miss 
>> count is available, and can serve as a good way to decide what a more 
>> reasonable maxactive value should be. This should help prevent users 
>> from trying with arbitrary maxactive values.
> 
> Such feedback loop is an interesting idea.
> Note that nmissed count is an accumulate value, not the max number of
> the instance which will be needed.

Yes, we will have to factor-in the duration during which the event was 
active. This will still be an approximation, but serves as a good 
starting point. It may need a few tries to get this right, but more
importantly, the user knows instantly that there are missed probes.

> 
>> For perf_event_open(), perhaps we can introduce an ioctl to query the 
>> probe miss count.
> 
> Or, maybe we can expand the maxactive in runtime. e.g. add a shortage
> counter on the kretprobe, and run a monitor kernel thread (or kworker).
> If the shortage counter is incremented, the monitor allocates instances
> (2x counter) and give it to the kretprobe. And it resets the shortage
> counter. This adaptive maxactive may cause mis-hit in the beginning,
> but finally find the optimal maxactive value automatically.

I like this idea and I have been thinking along these lines too. If we 
start with a better default (rather than just num_possible_cpus() used 
today), I suspect we may be able to get this to work well enough to not 
have to miss any probes. Specifying 'maxactive' can still serve as a 
workaround to allocate a larger initial set of kretprobe_instances in 
case this doesn't work.

> 
> 
>> > To avoid such trouble, I had set the 4096 limitation for the maxactive
>> > parameter. Of course 4096 may not enough for some use-cases. I'm 
>> > welcome
>> > to expand it (e.g. 32k, isn't it enough?), but removing the limitation
>> > may cause OOM trouble easily.
>> 
>> Do you have suggestions for how we can determine a better limit? As you 
>> point out in the other email, there could very well be 64k or more 
>> processes on a large machine. Since the primary concern is memory usage, 
>> we probably need to decide this based on total memory. But, memory usage 
>> will vary depending on system load...
> 
> This is very good question. IMHO, it might better to calculate the total
> maxactive from the system memory size. For example, 1% of system memory
> can be used for the kretprobes, 16GB system will allow using 160MB for
> kretprobes, which means about "30M" is the max number of maxactive, or
> multiple kretprobes can share it. Doesn't it sound enough? Of course
> this will need to show the current usage of the kretprobe instance objects
> via tracefs or debugfs. But this total cap seems reasonable for me to
> avoid OOM trouble.
> 
>> Perhaps we can start by making maxactive limit be a tunable with a 
>> default value of 4096, with the understanding that users will be careful 
>> when bumping up this value. Hopefully, scripts won't simply start 
>> writing into this file ;)
> 
> Yeah, that's what I suggested at first, because the best maxactive will
> depend on the max number of the *processes* and the probed function.
> 
> If the probed function will NOT be preempted or slept, maxactive will be
> the number of *processor cores*. Or, if it can be preempted or slept, it
> will be the max number of *processes*. If the probed function can
> recursively called (Note: this is rare case), the maxactive has to
> be multiplied.
> 
> It is hard to estimate the max number of processes, since it depends
> on the system. Small embedded systems don't run thousands of processes,
> but big servers will run more than ten thousands of processes.
> Thus make it tunable will be a good idea.

Agree.


Thanks,
Naveen