lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <528E1EE6.9040407@tu-dresden.de>
Date:	Thu, 21 Nov 2013 15:55:34 +0100
From:	Joseph Schuchart <joseph.schuchart@...dresden.de>
To:	Ingo Molnar <mingo@...nel.org>
CC:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Paul Mackerras <paulus@...ba.org>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	thomas.ilsche@...dresden.de, linux-kernel@...r.kernel.org,
	Fr??d??ric Weisbecker <fweisbec@...il.com>,
	David Ahern <dsahern@...il.com>
Subject: Re: [PATCH] Perf: Correct Assumptions about Sample Timestamps in
 Passes

Sorry for my delayed reply, it's been a busy week and I really wanted to
give Ingo's idea below some thought. Please find my comments below.

>>> If done that way then AFAICS we could even eliminate the 
>>> ->max_timestamps[NR_CPUS] array.
>>
>> I can understand your performance concerns. However, I am not 
>> sure how we can determine the minimal max_timestamp of all cpus 
>> without storing the information on a per-cpu basis first. 
>> Accumulating it on the fly would only lead to a global 
>> max_timestamp. [...]
> 
> Ok. So this:
> 
> +static inline void set_next_flush(struct perf_session *session)
> +{
> +       int i;
> +       u64 min_max_timestamp = session->ordered_samples.max_timestamps[0];
> +       for (i = 1; i < MAX_NR_CPUS; i++) {
> +               if (min_max_timestamp > session->ordered_samples.max_timestamps[i])
> +                       min_max_timestamp = session->ordered_samples.max_timestamps[i];
> +       }
> +       session->ordered_samples.next_flush = min_max_timestamp;
> +}
> 
> which should IMHO be written in a bit clearer form as:
> 
> static inline void set_next_flush(struct perf_session *session)
> {
> 	u64 *timestamps = session->ordered_samples.max_timestamps;
> 	u64 min_timestamp = timestamps[0];
>  	int i;
> 
> 	for (i = 1; i < MAX_NR_CPUS; i++) {
> 		if (min_timestamp > timestamps[i])
> 			min_timestamp = timestamps[i];
> 	}
> 
> 	session->ordered_samples.next_flush = min_timestamp;
> }

Agreed.

> 
> calculates the minimum of the max_timestamps[] array, right?
> 
> Now, the max_timestamps[] array gets modified only in a single 
> place, from the sample timestamps, via:
> 
> 	os->max_timestamps[sample->cpu] = timestamp;
> 
> My suggestion was an identity transformation: to calculate the 
> minimum of the array when the max_timestamps[] array is modified. 
> A new minimum happens if the freshly written value is smaller 
> than the current minimum.

I am really not sure how this would work since I don't see where we
could make progress while flushing if the max_timestamp is only replaced
with a smaller one but is never increased. IMO, it is necessary to
distinguish between timestamps of different cpus to determine the
next_flush timestamp across all cpus in each pass. I have tried to come
up with a working implementation of your idea but do not see a way to
safely increase the value of max_timestamp (without making assumptions
about the order of timestamps between cpus and passes).

Thinking about it a little more I see two catches in my solution:
(1) a cpu is only considered in determining the next_flush timestamp if
at least one event of this cpu has been processed already. Otherwise the
cpu is unknown and its first timestamp could be below the already
flushed timeslice timestamp. This can also happen in the current
implementation.
(2) Flush progress is only guaranteed while events from all cpus are
read. As soon as there are no events left for a cpu which already had
events in this session, the next_flush will remain at the timestamp of
the last pass which had events from all cpus until the final flush. I am
not sure how likely this is. This does not happen in the current
implementation.

As for David's suggestion to read perf_clock() at the start of every
round as a flush timestamp: from looking at the comments in
kernel/sched/clock.c, local_clock() (which is called by perf_clock) is
not guaranteed to be monotonic across cpus so there could still be
earlier timestamps coming late. Or am I missing something there?

Simply increasing the window to more than two passes in the current
implementation might lower the chance of hitting this problem but does
not guarantee success since passes can be very short (only a few events
from a few cpus) or even be empty.

Cheers
Joseph
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ