linux-kernel - Re: [PATCH V2 2/8] perf tools: rewrite perf mmap read for overwrite

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0258cb8b-4c5d-255e-c4d9-2203ff89e5f1@huawei.com>
Date:   Tue, 19 Dec 2017 11:04:31 +0800
From:   "Wangnan (F)" <wangnan0@...wei.com>
To:     "Liang, Kan" <kan.liang@...el.com>,
        "acme@...nel.org" <acme@...nel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     "jolsa@...nel.org" <jolsa@...nel.org>,
        "namhyung@...nel.org" <namhyung@...nel.org>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "yao.jin@...ux.intel.com" <yao.jin@...ux.intel.com>
Subject: Re: [PATCH V2 2/8] perf tools: rewrite perf mmap read for overwrite



On 2017/12/19 3:07, Liang, Kan wrote:
>>> -union perf_event *perf_mmap__read_backward(struct perf_mmap *map)
>>> +union perf_event *perf_mmap__read_backward(struct perf_mmap *map,
>>> +					   u64 *start, u64 end)
>>>    {
>>> -	u64 head, end;
>>> -	u64 start = map->prev;
>>> -
>>> -	/*
>>> -	 * Check if event was unmapped due to a POLLHUP/POLLERR.
>>> -	 */
>>> -	if (!refcount_read(&map->refcnt))
>>> -		return NULL;
>>> -
>> Is removing this checking safe?
> It was duplicate as the one in perf_mmap__read_catchup.
> Once planned to remove one. But it looks I removed both accidently.
> Will keep one in V3.
>
>>> -	head = perf_mmap__read_head(map);
>>> -	if (!head)
>>> -		return NULL;
>>> +	union perf_event *event = NULL;
>>>
>>> -	/*
>>> -	 * 'head' pointer starts from 0. Kernel minus sizeof(record) form
>>> -	 * it each time when kernel writes to it, so in fact 'head' is
>>> -	 * negative. 'end' pointer is made manually by adding the size of
>>> -	 * the ring buffer to 'head' pointer, means the validate data can
>>> -	 * read is the whole ring buffer. If 'end' is positive, the ring
>>> -	 * buffer has not fully filled, so we must adjust 'end' to 0.
>>> -	 *
>>> -	 * However, since both 'head' and 'end' is unsigned, we can't
>>> -	 * simply compare 'end' against 0. Here we compare '-head' and
>>> -	 * the size of the ring buffer, where -head is the number of bytes
>>> -	 * kernel write to the ring buffer.
>>> -	 */
>>> -	if (-head < (u64)(map->mask + 1))
>>> -		end = 0;
>>> -	else
>>> -		end = head + map->mask + 1;
>>> +	event = perf_mmap__read(map, *start, end, &map->prev);
>>> +	*start = map->prev;
>>>
>>> -	return perf_mmap__read(map, start, end, &map->prev);
>>> +	return event;
>>>    }
>> perf_mmap__read_backward() and perf_mmap__read_forward() become
>> asymmetrical,
>> but their names are symmetrical.
>>
>>
>>> -void perf_mmap__read_catchup(struct perf_mmap *map)
>>> +static int overwrite_rb_find_range(void *buf, int mask, u64 head,
>>> +				   u64 *start, u64 *end);
>>> +
>>> +int perf_mmap__read_catchup(struct perf_mmap *map,
>>> +			    bool overwrite,
>>> +			    u64 *start, u64 *end,
>>> +			    unsigned long *size)
>>>    {
>>> -	u64 head;
>>> +	u64 head = perf_mmap__read_head(map);
>>> +	u64 old = map->prev;
>>> +	unsigned char *data = map->base + page_size;
>>>
>>> -	if (!refcount_read(&map->refcnt))
>>> -		return;
>>> +	*start = overwrite ? head : old;
>>> +	*end = overwrite ? old : head;
>>>
>>> -	head = perf_mmap__read_head(map);
>>> -	map->prev = head;
>>> +	if (*start == *end)
>>> +		return 0;
>>> +
>>> +	*size = *end - *start;
>>> +	if (*size > (unsigned long)(map->mask) + 1) {
>>> +		if (!overwrite) {
>>> +			WARN_ONCE(1, "failed to keep up with mmap data.
>> (warn only once)\n");
>>> +
>>> +			map->prev = head;
>>> +			perf_mmap__consume(map, overwrite);
>>> +			return 0;
>>> +		}
>>> +
>>> +		/*
>>> +		 * Backward ring buffer is full. We still have a chance to read
>>> +		 * most of data from it.
>>> +		 */
>>> +		if (overwrite_rb_find_range(data, map->mask, head, start,
>> end))
>>> +			return -1;
>>> +	}
>>> +
>>> +	return 1;
>> Coding suggestion: this function returns 2 different value (1 and -1) in
>> two fail path. Should return 0 for success and -1 for failure.
>>
> For perf top, -1 is enough for failure.
> But for perf_mmap__push, it looks two fail paths are needed, aren't they?
>   

I see. I think this is not a good practice. Please try to avoid returning
3 states. For example, comparing start and end outside? If can't avoid, how
about returning an enum to notice reader about the difference?

>> You totally redefine perf_mmap__read_catchup(). The original meaning of
>> this function is adjust md->prev so following perf_mmap__read() can read
>> events one by one safely.  Only backward ring buffer needs catchup:
>> kernel's 'head' pointer keeps moving (backward), while md->prev keeps
>> unchanged. For forward ring buffer, md->prev will catchup each time when
>> reading from the ring buffer.
>>
>> Your patch totally changes its meaning. Now its meaning is finding the
>> available data from a ring buffer (report start and end). At least we
>> need a better name.
> Sure, I will introduce a new function to do it in V3.
>
>> In addition, if I understand your code correctly, we
>> don't need to report 'size' to caller, because size is determined by
>> start and end.
> Sure, I will remove the 'size' in V3.
>
>> In addition, since the concept of backward and overwrite are now
>> unified, we can avoid updating map->prev during one-by-one reading
>> (because backward reading don't need consume operation). I think we can
>> decouple the updating of map->prev and these two basic read functions.
>> This can makes the operations on map->prev clearer.  Now the moving
>> direction of map->prev is confusing for backward ring buffer: it moves
>> forward during one by one reading, and moves backward when block
>> reading
>> (in perf record)
> For reading, I think both one by one reading and block reading move forward
> for overwrite.
> It starts from head and ends in map->prev. No?
>
> For one-by-one reading, yes, it doesn’t need to update map->prev for each read.
> But both need to update the map->prev when the block is finished.


It depends on how you think about map->prev.

map->prev is the last position it has been processed. For forward ring 
buffer,
in both one-by-one reading and block reading, the processing direction 
is same,
so map->prev keeps moving forward. For backward ring buffer, the processing
direction is different: for perf record, the processing direction is 
backward,
so map->prev moves backward; for one-by-one reading, the reading 
direction is
same as forward ring buffer, so it moves forward. It still needs to move 
backward
again to read the next block.

If we can avoid moving it forward in one-by-one reading, map->prev would 
consistent
with the direction of the ring buffer in all cases. Then we can enforce 
the meaning
of the operation of 'process' that, only perf_mmap__consume() process a 
block of
ring buffer.

>> and when syncing.
>>
>> Then the core of two reading become uniform. Let's get rid of
>> duplication.
>>
>> Summarize:
>>
>> 1. Compute both start and end before reading for both forward and
>> backward (perf_mmap__read_catchup, but need a better name).
> How about perf_mmap__read_init?

Good.

>> 2. Don't update map->prev in perf_mmap__read. Let perf_mmap__read()
>> update start pointer, so it become stateless and suit for both backward
>> and forward reading.
>>
>> 3. Update md->prev for forward reading. Merge it into
>> perf_mmap__consume?
> It looks we have to pass the updated 'start' to perf_mmap__consume.
> Move interfaces like perf_evlist__mmap_read need to be changed.
> That would impacts other tools which only support non-overwrite mode.
>
> How about update both 'md->prev' and 'start' in perf_mmap__read()
> like the patch as below?

What about making perf_mmap__read() totally stateless? Don't
touch md->prev there, and makes forward reading do an extra
mp->prev updating before consuming?

> Introducing a new perf_mmap__read_event to get rid of
> the perf_mmap__read_backward/forward.
>
> perf_mmap__read_backward will be removed later.
> Keep perf_mmap__read_forward since other tools still use it.


OK. For all reader who doesn't care backward or forward, it should use
perf_mmap__read() and maintains start position by its own, and give it
a chance to adjust map->prev if the ringbuffer is forward.

perf_mmap__read_forward() borrows mp->prev to maintain start position
like this:

union perf_event *perf_mmap__read_forward(struct perf_mmap *map)
{
     ....
     return perf_mmap__read(map, &map->prev, head);
}


> It has to update the 'end' for non-overwrite mode in each read since the
> ringbuffer is not paused.
>
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 484a394..74f33fd 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -22,16 +22,16 @@ size_t perf_mmap__mmap_len(struct perf_mmap *map)
>   
>   /* When check_messup is true, 'end' must points to a good entry */
>   static union perf_event *perf_mmap__read(struct perf_mmap *map,
> -					 u64 start, u64 end, u64 *prev)
> +					 u64 *start, u64 end, u64 *prev)


Don't need *prev because it can be replaced by *start.

>   {
>   	unsigned char *data = map->base + page_size;
>   	union perf_event *event = NULL;
> -	int diff = end - start;
> +	int diff = end - *start;
>   
>   	if (diff >= (int)sizeof(event->header)) {
>   		size_t size;
>   
> -		event = (union perf_event *)&data[start & map->mask];
> +		event = (union perf_event *)&data[*start & map->mask];
>   		size = event->header.size;
>   
>   		if (size < sizeof(event->header) || diff < (int)size) {
> @@ -43,8 +43,8 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
>   		 * Event straddles the mmap boundary -- header should always
>   		 * be inside due to u64 alignment of output.
>   		 */
> -		if ((start & map->mask) + size != ((start + size) & map->mask)) {
> -			unsigned int offset = start;
> +		if ((*start & map->mask) + size != ((*start + size) & map->mask)) {
> +			unsigned int offset = *start;
>   			unsigned int len = min(sizeof(*event), size), cpy;
>   			void *dst = map->event_copy;
>   
> @@ -59,12 +59,12 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
>   			event = (union perf_event *)map->event_copy;
>   		}
>   
> -		start += size;
> +		*start += size;
>   	}
>   
>   broken_event:
>   	if (prev)
> -		*prev = start;
> +		*prev = *start;
>   
>   	return event;
>   }
> @@ -132,6 +107,42 @@ void perf_mmap__read_catchup(struct perf_mmap *map)
>   	map->prev = head;
>   }
> +
> +/*
> + * Read event from ring buffer. Return one event for each call.
> + * Support both overwirte and non-overwrite mode.
> + * The start and end are only available for overwirte mode, which
> + * pause the ringbuffer.
> + *
> + * Usage:
> + * perf_mmap__read_init
> + * while(event = perf_mmap__read_event) {
> + * 	//process the event
> + * 	perf_mmap__consume

Need a transparent method before perf_mmap__consume to adjust
md->prev if the ring buffer is forward. Or let'w wrap perf_mmap__consume
to do it if you don't want to break its API?

Thank you.