[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c22fd9c5-6727-46c2-a811-784315edf7cb@intel.com>
Date: Tue, 22 Oct 2024 12:53:01 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Ingo Molnar <mingo@...nel.org>, Uros Bizjak <ubizjak@...il.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, llvm@...ts.linux.dev,
Dennis Zhou <dennis@...nel.org>, Tejun Heo <tj@...nel.org>,
Christoph Lameter <cl@...ux.com>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>, Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>, Bill Wendling
<morbo@...gle.com>, Justin Stitt <justinstitt@...gle.com>
Subject: Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when
comparing in percpu_add_op()
On 10/17/24 11:18, Peter Zijlstra wrote:
> On Wed, Oct 16, 2024 at 12:44:18PM -0700, Dave Hansen wrote:
>
>> Would anybody hate if we broke this up a bit, like:
>>
>> const typeof(var) _val = val;
>> const int paoconst = __builtin_constant_p(val);
>> const int paoinc = paoconst && ((_val) == 1);
>> const int paodec = paoconst && ((_val) == (typeof(var))-1);
>>
>> and then did
>>
>> if (paoinc)
>> percpu_unary_op(size, qual, "inc", var);
>> ...
> I think that is an overall improvement. Proceed! 🙂
I poked at this a bit:
> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=30e0899c6ab7fe1134e4b96db963f0be89b1dd5a
I believe it functions fine. But it surprised me with a few things.
Here's one. I assumed that doing an add((unsigned)-1) would be rare.
It's not. It's actually pretty common because this:
#define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val))
ends up causing problems when 'pcp' is an unsigned type. For example,
in this chain:
mem_cgroup_exit ->
obj_cgroup_put ->
percpu_ref_put ->
percpu_ref_put_many(ref, 1) ->
this_cpu_sub
the compiler can see the '1' constant. It effectively expands to:
this_cpu_add(pcp, -(unsigned long)(1))
With the old code, gcc manages to generate a 'dec'. Clang generates an
'add'. With my hack above both compilers generate an 'add'. This
actually matters in some code that seems potentially rather performance
sensitive:
add/remove: 0/0 grow/shrink: 219/9 up/down: 755/-141 (614)
Function old new delta
flush_end_io 905 1070 +165
x86_pmu_cancel_txn 242 338 +96
lru_add 554 594 +40
mlock_folio_batch 3264 3300 +36
compaction_alloc 3813 3838 +25
tcp_leave_memory_pressure 86 110 +24
account_guest_time 270 287 +17
...
So I think Peter's version was the best. It shuts up clang and also
preserves the existing (good) gcc 'sub' behavior. I'll send it out for
real in a bit, but I'm thinking of something like the attached patch.
View attachment "0001-x86-percpu-Avoid-comparing-unsigned-types-to-1.patch" of type "text/x-patch" (3180 bytes)
Powered by blists - more mailing lists