[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6371740df7704217926315e97294a894@AcuMS.aculab.com>
Date: Sat, 31 Oct 2020 12:11:42 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Peter Zijlstra' <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>
CC: Jesper Dangaard Brouer <brouer@...hat.com>,
"mingo@...nel.org" <mingo@...nel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>,
"acme@...nel.org" <acme@...nel.org>,
"mark.rutland@....com" <mark.rutland@....com>,
"alexander.shishkin@...ux.intel.com"
<alexander.shishkin@...ux.intel.com>,
"jolsa@...hat.com" <jolsa@...hat.com>,
"namhyung@...nel.org" <namhyung@...nel.org>,
"ak@...ux.intel.com" <ak@...ux.intel.com>,
"eranian@...gle.com" <eranian@...gle.com>
Subject: RE: [PATCH 4/6] perf: Optimize get_recursion_context()
From: Peter Zijlstra
> Sent: 30 October 2020 23:02
>
> On Fri, Oct 30, 2020 at 04:22:48PM -0400, Steven Rostedt wrote:
> > As this is something that ftrace recursion also does, perhaps we should
> > move this into interrupt.h so that anyone that needs a counter can get
> > it quickly, and not keep re-implementing it.
>
> Works for me, however:
>
> > /*
> > * Quickly find what context you are in.
> > * 0 - normal
> > * 1 - softirq
> > * 2 - hard interrupt
> > * 3 - NMI
> > */
> > static inline int irq_context()
> > {
> > unsigned int pc = preempt_count();
> > int rctx = 0;
>
> unsigned
>
> >
> > if (pc & (NMI_MASK))
> > rctx++;
> > if (pc & (NMI_MASK | HARDIRQ_MASK))
> > rctx++;
> > if (pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))
> > rctx++;
> >
> > return rctx;
> > }
>
> otherwise you'll get an extra instruction to sign extend it, which is
> daft (yes, i've been staring at GCC output far too much).
>
> Also, gcc-9 does worse (like 1 byte iirc) with:
>
> rctx += !!(pc & (NMI_MASK));
> rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK));
> rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET));
>
> but gcc-10 doesn't seem to care.
You've made be look at some gcc output (it's raining).
The gcc 7.5.0 I have handy probably generates the best code for:
unsigned char q_2(unsigned int pc)
{
unsigned char rctx = 0;
rctx += !!(pc & (NMI_MASK));
rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK));
rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET));
return rctx;
}
0000000000000000 <q_2>:
0: f7 c7 00 00 f0 00 test $0xf00000,%edi # clock 0
6: 0f 95 c0 setne %al # clock 1
9: f7 c7 00 00 ff 00 test $0xff0000,%edi # clock 0
f: 0f 95 c2 setne %dl # clock 1
12: 01 c2 add %eax,%edx # clock 2
14: 81 e7 00 01 ff 00 and $0xff0100,%edi
1a: 0f 95 c0 setne %al
1d: 01 d0 add %edx,%eax # clock 3
1f: c3 retq
I doubt that is beatable.
I've annotated the register dependency chain.
Likely to be 3 (or maybe 4) clocks.
The other versions are a lot worse (7 or 8) without allowing
for 'sbb' taking 2 clocks on a lot of Intel cpus.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists