[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220613084739.1159111-1-irogers@google.com>
Date: Mon, 13 Jun 2022 01:47:33 -0700
From: Ian Rogers <irogers@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
James Clark <james.clark@....com>,
Kees Cook <keescook@...omium.org>,
"Gustavo A. R. Silva" <gustavoars@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Riccardo Mancini <rickyman7@...il.com>,
German Gomez <german.gomez@....com>,
Colin Ian King <colin.king@...el.com>,
Song Liu <songliubraving@...com>,
Dave Marchevsky <davemarchevsky@...com>,
Athira Rajeev <atrajeev@...ux.vnet.ibm.com>,
Alexey Bayduraev <alexey.v.bayduraev@...ux.intel.com>,
Leo Yan <leo.yan@...aro.org>, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: Stephane Eranian <eranian@...gle.com>,
Ian Rogers <irogers@...gle.com>
Subject: [PATCH 0/6] Corrections to cpu map event encoding
A mask encoding of a cpu map is laid out as:
u16 nr
u16 long_size
unsigned long mask[];
However, the mask may be 8-byte aligned meaning there is a 4-byte pad
after long_size. This means 32-bit and 64-bit builds see the mask as
being at different offsets. On top of this the structure is in the byte
data[] encoded as:
u16 type
char data[]
This means the mask's struct isn't the required 4 or 8 byte aligned, but
is offset by 2. Consequently the long reads and writes are causing
undefined behavior as the alignment is broken.
These changes do minor clean up with const, visibility of functions
and using the constant time max function. It then adds 32 and 64-bit
mask encoding variants, packed to match current alignment. Taking the
address of a packed struct leads to unaligned data, so function
arguments are altered to be passed the packed struct. To compact the
mask encoding further and drop the padding, the 4-byte variant is
preferred. Finally a new range encoding is added, that reduces the
size of the common case of a range of CPUs to a single u64.
On a 72 CPU (hyperthread) machine the original encoding of all CPUs is:
0x9a98 [0x28]: event: 74
.
. ... raw event: size 40 bytes
. 0000: 4a 00 00 00 00 00 28 00 01 00 02 00 08 00 00 00 J.....(.........
. 0010: 00 00 ff ff ff ff ff ff ff ff ff 00 00 00 00 00 ................
. 0020: 00 00 00 00 00 00 00 00 ........
0 0 0x9a98 [0x28]: PERF_RECORD_CPU_MAP
Using the 4-byte encoding it is:
0x9a98@...e [0x20]: event: 74
.
. ... raw event: size 32 bytes
. 0000: 4a 00 00 00 00 00 20 00 01 00 03 00 04 00 ff ff J..... .........
. 0010: ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00 ................
0 0 0x9a98 [0x20]: PERF_RECORD_CPU_MAP
Finally, with the range encoding it is:
0x9ab8@...e [0x10]: event: 74
.
. ... raw event: size 16 bytes
. 0000: 4a 00 00 00 00 00 10 00 02 00 00 00 00 00 47 00 J.............G.
0 0 0x9ab8 [0x10]: PERF_RECORD_CPU_MAP
Ian Rogers (6):
perf cpumap: Const map for max
perf cpumap: Synthetic events and const/static
perf cpumap: Compute mask size in constant time
perf cpumap: Fix alignment for masks in event encoding
perf events: Prefer union over variable length array
perf cpumap: Add range data encoding
tools/lib/perf/cpumap.c | 2 +-
tools/lib/perf/include/perf/cpumap.h | 2 +-
tools/lib/perf/include/perf/event.h | 61 ++++++++-
tools/perf/tests/cpumap.c | 71 ++++++++---
tools/perf/tests/event_update.c | 14 +--
tools/perf/util/cpumap.c | 111 +++++++++++++---
tools/perf/util/cpumap.h | 4 +-
tools/perf/util/event.h | 4 -
tools/perf/util/header.c | 24 ++--
tools/perf/util/session.c | 35 +++---
tools/perf/util/synthetic-events.c | 182 +++++++++++++--------------
tools/perf/util/synthetic-events.h | 2 +-
12 files changed, 327 insertions(+), 185 deletions(-)
--
2.36.1.476.g0c4daa206d-goog
Powered by blists - more mailing lists