[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240701125643.kqJWwrhW@linutronix.de>
Date: Mon, 1 Jul 2024 14:56:43 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
Adrian Hunter <adrian.hunter@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Daniel Bristot de Oliveira <bristot@...nel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Ian Rogers <irogers@...gle.com>, Ingo Molnar <mingo@...hat.com>,
Jiri Olsa <jolsa@...nel.org>, Kan Liang <kan.liang@...ux.intel.com>,
Marco Elver <elver@...gle.com>, Mark Rutland <mark.rutland@....com>,
Namhyung Kim <namhyung@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v4 3/6] perf: Shrink the size of the recursion counter.
On 2024-07-01 14:31:37 [+0200], Peter Zijlstra wrote:
> On Mon, Jun 24, 2024 at 05:15:16PM +0200, Sebastian Andrzej Siewior wrote:
> > There are four recursion counter, one for each context. The type of the
> > counter is `int' but the counter is used as `bool' since it is only
> > incremented if zero.
> >
> > Reduce the type of the recursion counter to an unsigned char, keep the
> > increment/ decrement operation.
>
> Does this actually matter? Aren't u8 memops encoded by longer
> instructions etc..
The goal here isn't to reduce the opcodes but to add it to task_struct
without making it larger by filling a hole.
But since you made me look at assembly:
old:
316b: 65 48 8b 15 00 00 00 mov %gs:0x0(%rip),%rdx # 3173 <perf_swevent_get_recursion_context+0x33>
3173: 1c ff sbb $0xff,%al
3175: 48 0f be c8 movsbq %al,%rcx
3179: 48 8d 94 8a 00 00 00 lea 0x0(%rdx,%rcx,4),%rdx
3180: 00
317d: R_X86_64_32S .data..percpu+0x4c
3181: 8b 0a mov (%rdx),%ecx
3183: 85 c9 test %ecx,%ecx
3185: 75 0e jne 3195 <perf_swevent_get_recursion_context+0x55>
3187: c7 02 01 00 00 00 movl $0x1,(%rdx)
^^^
318d: 0f be c0 movsbl %al,%eax
new:
2ff8: 1c ff sbb $0xff,%al
2ffa: 81 e2 00 01 ff 00 and $0xff0100,%edx
3000: 83 fa 01 cmp $0x1,%edx
3003: 1c ff sbb $0xff,%al
3005: 48 0f be d0 movsbq %al,%rdx
3009: 48 8d 94 11 00 00 00 lea 0x0(%rcx,%rdx,1),%rdx
3010: 00
300d: R_X86_64_32S .data..percpu+0x4c
3011: 80 3a 00 cmpb $0x0,(%rdx)
3014: 75 0b jne 3021 <perf_swevent_get_recursion_context+0x51>
3016: c6 02 01 movb $0x1,(%rdx)
^^^
3019: 0f be c0 movsbl %al,%eax
301c: e9 00 00 00 00 jmp 3021 <perf_swevent_get_recursion_context+0x51>
So we do even save a few bytes. We could avoid the "movsbl" at 3019 by
making the return type `unsigned char' ;)
Sebastian
Powered by blists - more mailing lists