lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230729083737.38699-2-leobras@redhat.com>
Date:   Sat, 29 Jul 2023 05:37:31 -0300
From:   Leonardo Bras <leobras@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        linux-kernel@...r.kernel.org
Cc:     Leonardo Bras <leobras@...hat.com>
Subject: [RFC PATCH 0/4] Improve RT performance & latency w/ local_lock_t()

The problem:
We have a few scenarios in mm that make use of local_locks() + 
schedule_work_on() due to reduced overhead on the most common local 
references. This scenario is not ideal for RT workloads though: even on 
isolated cpus, those tasks will schedule-out the sensitive RT workloads to 
perform those jobs, and usually cause missing deadlines with this.

The idea:
In PREEMPT_RT, local_locks() end up becoming spinlocks(), so there should 
be no harm in just getting another cpu's spinlock to perform the work 
on the per-cpu structure: the cacheline invalidation will already happen 
due to the usage by schedule_work_on(), and on local functions the locking 
already happens anyway.

This will avoid schedule_work_on(), and thus avoid scheduling-out an 
RT workload. Given the usually brief nature of those scheduled tasks, their 
execution end up being faster than doing their scheduling.

======

While I really belive the solution, there are problems with this patchset, 
which I need your suggestions for improvement:

1) Naming is horrible: I could not think on a good name for the new lock 
functions, so I lazely named it local_lock_n(). 
The naming should have been making clear that we are in fact dealing 
with a local_lock but it can in some scenarios be addressing another cpu's 
local_lock, and thus we need the extra cpu parameter. 

Dealing with this local & remote duality, all I thought was:
mostly_local_lock(), (or local_mostly_lock())
local_maybe_remote_lock()  <- LOL
remote_local_lock()
per_cpu_local_lock()
local_lock_cpu()

Please help !


2) Maybe naming is this hard because the interface is not the best.
My idea was to create a simple interface to easily replace functions 
already in use, but maybe there is something better I could not see.

Please contribute!

3) I am lazely setting work->data without atomic operations, which may 
be bad in some scenario. If so, even thought it can be costly, since it 
happens outside of the hotpath (local per-cpu areas) it should be no 
problem adding the atomic operation for non-RT kernels.

For RT kernels, since the whole operation hapens locally, there should be 
no need of using the atomic set.

Please let me know of any idea, or suggestion that can improve this RFC.

Thanks a lot!
Leo

Leonardo Bras (4):
  Introducing local_lock_n() and local queue & flush
  swap: apply new local_schedule_work_on() interface
  memcontrol: apply new local_schedule_work_on() interface
  slub: apply new local_schedule_work_on() interface

 include/linux/local_lock.h          | 18 ++++++++++
 include/linux/local_lock_internal.h | 52 +++++++++++++++++++++++++++++
 mm/memcontrol.c                     | 17 ++++++----
 mm/slub.c                           | 17 ++++++----
 mm/swap.c                           | 18 +++++-----
 5 files changed, 100 insertions(+), 22 deletions(-)

-- 
2.41.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ