[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <00c4c55b-7fa6-d29c-4a80-c196922ef527@redhat.com>
Date: Wed, 5 Jan 2022 22:34:31 -0500
From: Waiman Long <longman@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-kernel@...r.kernel.org
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] locking/local_lock: Make the empty local_lock_*()
function a macro.
On 1/5/22 15:26, Sebastian Andrzej Siewior wrote:
> It has been said that local_lock() does not add any overhead compared to
> preempt_disable() in a !LOCKDEP configuration. A microbenchmark showed
> an unexpected result which can be reduced to the fact that local_lock()
> was not entirely optimized away.
> In the !LOCKDEP configuration local_lock_acquire() is an empty static
> inline function. On x86 the this_cpu_ptr() argument of that function is
> fully evaluated leading to an additional mov+add instructions which are
> not needed and not used.
>
> Replace the static inline function with a macro. The typecheck() macro
> ensures that the argument is of proper type while the resulting
> disassembly shows no traces of this_cpu_ptr().
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> ---
> On -rc8, size says:
> | text data bss dec filename
> | 19656718 8681015 3764440 32102173 vmlinux.old
> | 19656218 8681015 3764440 32101673 vmlinux.new
>
> Which is -500 text, not much but still.
>
> include/linux/local_lock_internal.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h
> index 975e33b793a77..6d635e8306d64 100644
> --- a/include/linux/local_lock_internal.h
> +++ b/include/linux/local_lock_internal.h
> @@ -44,9 +44,9 @@ static inline void local_lock_debug_init(local_lock_t *l)
> }
> #else /* CONFIG_DEBUG_LOCK_ALLOC */
> # define LOCAL_LOCK_DEBUG_INIT(lockname)
> -static inline void local_lock_acquire(local_lock_t *l) { }
> -static inline void local_lock_release(local_lock_t *l) { }
> -static inline void local_lock_debug_init(local_lock_t *l) { }
> +# define local_lock_acquire(__ll) do { typecheck(local_lock_t *, __ll); } while (0)
> +# define local_lock_release(__ll) do { typecheck(local_lock_t *, __ll); } while (0)
> +# define local_lock_debug_init(__ll) do { typecheck(local_lock_t *, __ll); } while (0)
> #endif /* !CONFIG_DEBUG_LOCK_ALLOC */
>
> #define INIT_LOCAL_LOCK(lockname) { LOCAL_LOCK_DEBUG_INIT(lockname) }
I try out this patch and it indeed helps to reduce the object size of
functions that use local_lock(). However, the extra code isn't an
additional mov+add.
Using folio_add_lru() as an example,
Without the patch:
466 local_lock(&lru_pvecs.lock);
0x00000000000032ee <+14>: mov $0x1,%edi
0x00000000000032f3 <+19>: callq 0x32f8 <folio_add_lru+24>
0x00000000000032f8 <+24>: callq 0x32fd <folio_add_lru+29>
With the patch:
466 local_lock(&lru_pvecs.lock);
0x00000000000032ae <+14>: mov $0x1,%edi
0x00000000000032b3 <+19>: callq 0x32b8 <folio_add_lru+24>
There is one less placeholder for tracing. Maybe it depends on the
compiler and the exact config options.
Anyway,
Reviewed-by: Waiman Long <longman@...hat.com>
Powered by blists - more mailing lists