netdev - Re: [PATCH 27/53] perf/core: Put size of a sample at the end of it by PERF_SAMPLE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160111180913.GA25950@ast-mbp.thefacebook.com>
Date:	Mon, 11 Jan 2016 10:09:14 -0800
From:	Alexei Starovoitov <alexei.starovoitov@...il.com>
To:	Wang Nan <wangnan0@...wei.com>
Cc:	acme@...nel.org, linux-kernel@...r.kernel.org, pi3orama@....com,
	lizefan@...wei.com, netdev@...r.kernel.org, davem@...emloft.net,
	Adrian Hunter <adrian.hunter@...el.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	David Ahern <dsahern@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Yunlong Song <yunlong.song@...wei.com>
Subject: Re: [PATCH 27/53] perf/core: Put size of a sample at the end of it
 by PERF_SAMPLE_TAILSIZE

On Mon, Jan 11, 2016 at 01:48:18PM +0000, Wang Nan wrote:
> This patch introduces a PERF_SAMPLE_TAILSIZE flag which allows a size
> field attached at the end of a sample. The idea comes from [1] that,
> with tie size at tail of an event, it is possible for user program who
> read from the ring buffer parse events backward.
> 
> For example:
> 
>    head
>     |
>     V
>  +--+---+-------+----------+------+---+
>  |E6|...|   B  8|   C    11|  D  7|E..|
>  +--+---+-------+----------+------+---+
> 
> In this case, from the 'head' pointer provided by kernel, user program
> can first see '6' by (*(head - sizeof(u64))), then it can get the start
> pointer of record 'E', then it can read size and find start position
> of record D, C, B in similar way.

adding extra 8 bytes for every sample is quite unfortunate.
How about another idea:
. update data_tail pointer when head is about to overwrite it

Ex:
   head   data_tail
    |       |
    V       V
 +--+-------+-------+---+----+---+
 |E |  ...  |   B   | C |  D | E |
 +--+-------+-------+---+----+---+

if new sample F is about to overwrite B, the kernel would need
to read the size of B from B's header and update data_tail to point C.
Or even further.
Comparing to TAILSIZE approach, now kernel will be doing both reads
and writes into ring-buffer and there is a concern that reads may
be hitting cold data, but if the records are small they may be
actually on the same cache line brought by the previous
read A's header, write E record cycle. So I think we shouldn't see
cache misses.
Another concern is validity of records stored. If user space messes
with ring-buffer, kernel won't be able to move data_tail properly
and would need to indicate that to userspace somehow.
But memory saving of 8 bytes per record could be sizable and
user space wouldn't need to walk the whole buffer backwards and
can just start from valid data_tail, so the dumps of overwrite
ring-buffer will be faster too.
Thoughts?