linux-kernel - [PATCH] Perf: Correct Assumptions about Sample Timestamps in Passes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <528484CB.7@tu-dresden.de>
Date:	Thu, 14 Nov 2013 09:07:39 +0100
From:	Joseph Schuchart <joseph.schuchart@...dresden.de>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Paul Mackerras <paulus@...ba.org>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>
CC:	thomas.ilsche@...dresden.de,
	Joseph Schuchart <joseph.schuchart@...dresden.de>,
	linux-kernel@...r.kernel.org
Subject: [PATCH] Perf: Correct Assumptions about Sample Timestamps in Passes

Good morning,

We came across a problem in perf script which causes it to abort reading
a file produced with perf record, complaining about timestamps being
earlier than the last flushed timeslice. ("Warning: Timestamp below last
timeslice flush")

While investigating the issue, we found that the assumptions about the
relation of timestamps between different CPUs and the passes in which
the data is read from the kernel might not be entirely valid. The
current model (as is described in util/session.c lines 552 to 590)
states that it is safe to store the maximum timestamp seen in pass n
(T_n) and use it to flush the samples after pass n+1, assuming that
there will be samples with timestamps >=T_n for all CPUs in pass n+1.

However, based on the description given in the source file, we can come
up with an example that breaks this assumption:

 *    ============ PASS n =================
 *       CPU 0         |   CPU 1
 *                     |
 *    cnt1 timestamps  |   cnt2 timestamps
 *          1          |         2
 *          -          |         3
 *          -          |         4  <--- max recorded
 *
 *    ============ PASS n + 1 ==============
 *       CPU 0         |   CPU 1
 *                     |
 *    cnt1 timestamps  |   cnt2 timestamps
 *          2          |         5
 *          -          |         6
 *          -          |         7 <---- max recorded
 *
 *      Flush every events below timestamp 4
 *
 *    ============ PASS n + 2 ==============
 *       CPU 0         |   CPU 1
 *                     |
 *    cnt1 timestamps  |   cnt2 timestamps
 *          3          |         8
 *    --> This event will break the assumption and lead to the error
described above.
 *          4          |         9
 *          -          |         10
 *

Note that the trace depicted above is perfectly legal (to our
understanding) and we have seen such a trace coming out of perf record.
Unfortunately, we can only reproduce it when running a large proprietary
application for which we cannot provide the trace data. Any attempt to
come up with a small reproducer failed so far. I hope the above
description of the problem is sufficient, though.

The attached patch modifies the session to keep a record of the maximum
timestamp seen per CPU and the samples are flushed after reading pass n
up to the minimum of the maximum timestamps of each CPU in this pass.
The change should be small and do not interfere with any other
functionality of perf. The patch is based on the current Linux version
in git.

Please let me know if you need any further information.

Signed-off-by: Joseph Schuchart <joseph.schuchart@...dresden.de>

Thanks,
Joseph
-- 
Dipl. Inf. Joseph Schuchart
Computer Scientist

Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-36494
Fax: +49 351 463-3773
E-Mail: joseph.schuchart@...dresden.de

View attachment "perf_timestamp_pass.diff" of type "text/x-patch" (4814 bytes)