lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 12 Jan 2016 20:36:23 +0800 From: "Wangnan (F)" <wangnan0@...wei.com> To: Alexei Starovoitov <alexei.starovoitov@...il.com>, Peter Zijlstra <a.p.zijlstra@...llo.nl> CC: <acme@...nel.org>, <linux-kernel@...r.kernel.org>, <pi3orama@....com>, <lizefan@...wei.com>, <netdev@...r.kernel.org>, <davem@...emloft.net>, Adrian Hunter <adrian.hunter@...el.com>, Arnaldo Carvalho de Melo <acme@...hat.com>, David Ahern <dsahern@...il.com>, Ingo Molnar <mingo@...nel.org>, Yunlong Song <yunlong.song@...wei.com> Subject: Re: [PATCH 27/53] perf/core: Put size of a sample at the end of it by PERF_SAMPLE_TAILSIZE On 2016/1/12 14:11, Alexei Starovoitov wrote: > On Tue, Jan 12, 2016 at 01:33:28PM +0800, Wangnan (F) wrote: >> >> On 2016/1/12 2:09, Alexei Starovoitov wrote: >>> On Mon, Jan 11, 2016 at 01:48:18PM +0000, Wang Nan wrote: >>>> This patch introduces a PERF_SAMPLE_TAILSIZE flag which allows a size >>>> field attached at the end of a sample. The idea comes from [1] that, >>>> with tie size at tail of an event, it is possible for user program who >>>> read from the ring buffer parse events backward. >>>> >>>> For example: >>>> >>>> head >>>> | >>>> V >>>> +--+---+-------+----------+------+---+ >>>> |E6|...| B 8| C 11| D 7|E..| >>>> +--+---+-------+----------+------+---+ >>>> >>>> In this case, from the 'head' pointer provided by kernel, user program >>>> can first see '6' by (*(head - sizeof(u64))), then it can get the start >>>> pointer of record 'E', then it can read size and find start position >>>> of record D, C, B in similar way. >>> adding extra 8 bytes for every sample is quite unfortunate. >>> How about another idea: >>> . update data_tail pointer when head is about to overwrite it >>> >>> Ex: >>> head data_tail >>> | | >>> V V >>> +--+-------+-------+---+----+---+ >>> |E | ... | B | C | D | E | >>> +--+-------+-------+---+----+---+ >>> >>> if new sample F is about to overwrite B, the kernel would need >>> to read the size of B from B's header and update data_tail to point C. >>> Or even further. >>> Comparing to TAILSIZE approach, now kernel will be doing both reads >>> and writes into ring-buffer and there is a concern that reads may >>> be hitting cold data, but if the records are small they may be >>> actually on the same cache line brought by the previous >>> read A's header, write E record cycle. So I think we shouldn't see >>> cache misses. >> After ring buffer rewind, we need a read before nearly >> every write operations. The performance penalty depends on >> configuration of write allocate. In addition, another data >> dependency is required: we must wait for the size of >> event B is retrived before overwrite it. >> >> Even in the very first try at 2013 in [1], reading from the ring >> buffer is avoided. I don't think Peter changes his mind now. >> >>> Another concern is validity of records stored. If user space messes >>> with ring-buffer, kernel won't be able to move data_tail properly >>> and would need to indicate that to userspace somehow. >>> But memory saving of 8 bytes per record could be sizable >> Yes. But I have already discussed with Peter on this in [2]. >> Last month I suggested: >> >> <quote> >> >> 1. If PERF_SAMPLE_SIZE is selected, we can avoid outputting the event >> size in header. Which eliminate extra space cost; >> </quote> >> >> However: >> >> <quote> >> >> That would mandate you always parse the stream backwards. Which seems >> rather unfortunate. Also, no you cannot recoup the extra space, see the >> alignment and size requirement. > hmm, in this kernel patch I see that you're adding 8 bytes for > every record via this extra TAILSISZE flag and in perf you're > walking the ring buffer backwards by reading this 8 byte > sizes, comparing header sizes and so on until reaching beginning, > where you start dumping it as normal. > So for this 'signal to perf' approach to work the ring buffer > will contain tailsizes everywhere just so that user space can > find the beginning. That's not very pretty. imo if kernel > can do header read to adjust data_tail it would make user > space side clean. May be there are other solutions. > Adding tailsize seems like brute force hack. > There must be some nicer way. Hi Peter, What's your opinion? Should we reconsider moving size field from header the end? Or moving whole header to the end of a record? Thank you.
Powered by blists - more mailing lists