linux-kernel - Re: [PATCH v11 2/5] ring-buffer: Introducing ring-buffer mapping functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240111181814.362c42cf@gandalf.local.home>
Date: Thu, 11 Jan 2024 18:23:20 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Vincent Donnefort <vdonnefort@...gle.com>, mhiramat@...nel.org,
 linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
 kernel-team@...roid.com
Subject: Re: [PATCH v11 2/5] ring-buffer: Introducing ring-buffer mapping
 functions

On Thu, 11 Jan 2024 11:34:58 -0500
Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:


> The LTTng kernel tracer has supported mmap'd buffers for nearly 15 years [1],
> and has a lot of similarities with this patch series.
> 
> LTTng has the notion of "subbuffer id" to allow atomically exchanging a
> "reader" extra subbuffer with the subbuffer to be read. It implements
> "get subbuffer" / "put subbuffer" ioctls to allow the consumer (reader)
> to move the currently read subbuffer position. [2]
> 
> It would not hurt to compare your approach to LTTng and highlight
> similarities/differences, and the rationale for the differences.
> 
> Especially when it comes to designing kernel ABIs, it's good to make sure
> that all bases are covered, because those choices will have lasting impacts.
> 
> Thanks,
> 
> Mathieu
> 
> [1] https://git.lttng.org/?p=lttng-modules.git;a=blob;f=src/lib/ringbuffer/ring_buffer_mmap.c
> [2] https://git.lttng.org/?p=lttng-modules.git;a=blob;f=src/lib/ringbuffer/ring_buffer_vfs.c
> 

Hi Mathieu,

Thanks for sharing!

As we discussed a little bit in the tracing meeting we do somethings
differently but very similar too ;-)

The similarities as that all the sub-buffers are mapped. You have a
reader-sub-buffer as well.

The main difference is that you use an ioctl() that returns where to find
the reader-sub-buffer, where our ioctl() is just "I'm done, get me a new
reader-sub-buffer". Ours will update the meta page.

Our meta page looks like this:

> +struct trace_buffer_meta {
> +	unsigned long	entries;
> +	unsigned long	overrun;
> +	unsigned long	read;

If start tracing: trace-cmd start -e sched_switch and do:

 ~ cat /sys/kernel/tracing/per_cpu/cpu0/stats 
entries: 14
overrun: 0
commit overrun: 0
bytes: 992
oldest event ts: 84844.825372
now ts: 84847.102075
dropped events: 0
read events: 0

You'll see similar to the above.

 entries = entries
 overrun = overrun
 read = read

The above "read" is total number of events read.

Pretty staight forward ;-)


> +
> +	unsigned long	subbufs_touched;
> +	unsigned long	subbufs_lost;
> +	unsigned long	subbufs_read;

Now I'm thinking we may not want this exported, as I'm not sure it's useful.

Vincent, talking with Mathieu, he was suggesting that we only export what
we really need, and I don't think we need the above. Especially since they
are not exposed in the stats file.


> +
> +	struct {
> +		unsigned long	lost_events;
> +		__u32		id;
> +		__u32		read;
> +	} reader;

The above is definitely needed, as all of it is used to read the
reader-page of the sub-buffer.

lost_events is the number of lost events that happened before this
sub-buffer was swapped out.

Hmm, Vincent, I just notice that you update the lost_events as:

> +static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
> +{
> +	struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> +
> +	WRITE_ONCE(meta->entries, local_read(&cpu_buffer->entries));
> +	WRITE_ONCE(meta->overrun, local_read(&cpu_buffer->overrun));
> +	WRITE_ONCE(meta->read, cpu_buffer->read);
> +
> +	WRITE_ONCE(meta->subbufs_touched, local_read(&cpu_buffer->pages_touched));
> +	WRITE_ONCE(meta->subbufs_lost, local_read(&cpu_buffer->pages_lost));
> +	WRITE_ONCE(meta->subbufs_read, local_read(&cpu_buffer->pages_read));
> +
> +	WRITE_ONCE(meta->reader.read, cpu_buffer->reader_page->read);
> +	WRITE_ONCE(meta->reader.id, cpu_buffer->reader_page->id);
> +	WRITE_ONCE(meta->reader.lost_events, cpu_buffer->lost_events);
> +}

The lost_events may need to be handled differently, as it doesn't always
get cleared. So it may be stale data.


> +
> +	__u32		subbuf_size;
> +	__u32		nr_subbufs;

This gets is the information needed to read the mapped ring buffer.

> +
> +	__u32		meta_page_size;
> +	__u32		meta_struct_len;

The ring buffer gets mapped after "meta_page_size" and this structure is
"meta_struct_len" which will change if we add new data to it.

> +};

-- Steve