linux-kernel - Re: [PATCH v4 02/10] user_events: Add minimal support for trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211109201432.GA1650@kbox>
Date:   Tue, 9 Nov 2021 12:14:32 -0800
From:   Beau Belgrave <beaub@...ux.microsoft.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Masami Hiramatsu <mhiramat@...nel.org>,
        linux-trace-devel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 02/10] user_events: Add minimal support for
 trace_event into ftrace

On Tue, Nov 09, 2021 at 02:25:06PM -0500, Steven Rostedt wrote:
> On Tue, 9 Nov 2021 11:08:44 -0800
> Beau Belgrave <beaub@...ux.microsoft.com> wrote:
> 
> > We need strings to be able to be emitted and recorded in eBPF, perf and
> > ftrace. So I would rather go after a solution that lets us keep these in
> > the ring buffers, even if it means a perf hit.
> > 
> > Guess what's left is to where the best place to check is, checking in
> > the filter with unsafe flags would let us keep most of the perf (we just
> > check the undersize case, 1 branch). When these unsafe types are
> > filtered then a perf tax is imposed to keep things safe.
> > 
> > It sounded like Steven wanted to think about this a bit, so I'll wait a
> > bit before poking again for consensus :)
> > 
> > Do you have any strong feelings about where it goes?
> 
> IIUC, the writing into the trace event is done via one big blob, correct?
> 

Yes, the top 4 bytes get trimmed off as an index, then it's a big blob
to all places except eBPF (when asking for the iterator directly).

> That is this:
> 
> +	if (likely(atomic_read(&tp->key.enabled) > 0)) {
> +		struct tracepoint_func *probe_func_ptr;
> +		user_event_func_t probe_func;
> +		void *tpdata;
> +		void *kdata;
> +		u32 datalen;
> +
> +		kdata = kmalloc(i->count, GFP_KERNEL);
> +
> +		if (unlikely(!kdata))
> +			return -ENOMEM;
> +
> +		datalen = copy_from_iter(kdata, i->count, i);
> +
> +		rcu_read_lock_sched();
> +
> +		probe_func_ptr = rcu_dereference_sched(tp->funcs);
> +
> +		if (probe_func_ptr) {
> +			do {
> +				probe_func = probe_func_ptr->func;
> +				tpdata = probe_func_ptr->data;
> +				probe_func(user, kdata, datalen, tpdata);
> +			} while ((++probe_func_ptr)->func);
> +		}
> +
> +		rcu_read_unlock_sched();
> +
> +		kfree(kdata);
> 
> So we really are just interested in making sure that the output is correct?
> 

Largely, yes.

The optimization part of the patch moves the buffer copies into the
probes to remove a double copy. I believe however that output can be
checked either centrally before the probes or within each probe call if
need be.

For perf/eBPF we may not need to check things, however, for ftrace
we will due to the filters. So we may be able to isolate to just the
ftrace probe method.

The ftrace probe will have a blob even after optimization due to the copy
into the ring buffer (assuming we can discard it if it violates a policy).

> That is, the reading of the trace file?
> 

We really need to ensure that data can be analyzed on the machine
directly (eBPF, ftrace, perf) as well as outside of the machine (ftrace, perf).

The priorities to us are fast recording speed with accurate reading of trace
files and event data.

> -- Steve