lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dfbaf200-7c87-41b2-ab87-906cbdf3e0d7@redhat.com>
Date: Wed, 8 Jan 2025 19:48:02 -0500
From: Waiman Long <llong@...hat.com>
To: Kumar Kartikeya Dwivedi <memxor@...il.com>, Waiman Long <llong@...hat.com>
Cc: bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Peter Zijlstra <peterz@...radead.org>, Alexei Starovoitov <ast@...nel.org>,
 Andrii Nakryiko <andrii@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Martin KaFai Lau <martin.lau@...nel.org>,
 Eduard Zingerman <eddyz87@...il.com>, "Paul E. McKenney"
 <paulmck@...nel.org>, Tejun Heo <tj@...nel.org>,
 Barret Rhoden <brho@...gle.com>, Josh Don <joshdon@...gle.com>,
 Dohyun Kim <dohyunkim@...gle.com>, kernel-team@...a.com
Subject: Re: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for
 CONFIG_PARAVIRT


On 1/8/25 3:32 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, 8 Jan 2025 at 21:57, Waiman Long <llong@...hat.com> wrote:
>> On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
>>> We ripped out PV and virtualization related bits from rqspinlock in an
>>> earlier commit, however, a fair lock performs poorly within a virtual
>>> machine when the lock holder is preempted. As such, retain the
>>> virt_spin_lock fallback to test and set lock, but with timeout and
>>> deadlock detection.
>>>
>>> We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that
>>> requires more involved algorithmic changes and introduces more
>>> complexity. It can be done when the need arises in the future.
>> virt_spin_lock() doesn't scale well. It is for hypervisors that don't
>> support PV qspinlock yet. Now rqspinlock() will be in this category.
> We would need to make algorithmic changes to paravirt versions, which
> would be too much for this series, so I didn't go there.
I know. The paravirt part is the most difficult. It took me over a year 
to work on the paravirt part of qspinlock to get it right and merged 
upstream.
>
>> I wonder if we should provide an option to disable rqspinlock and fall
>> back to the regular qspinlock with strict BPF locking semantics.
> That unfortunately won't work, because rqspinlock operates essentially
> like a trylock, where it is allowed to fail and callers must handle
> errors accordingly. Some of the users in BPF (e.g. in patch 17) remove
> their per-cpu nesting counts to rely on AA deadlock detection of
> rqspinlock, which would cause a deadlock if we transparently replace
> it with qspinlock as a fallback.

I see. This information should be documented somewhere.


>> Another question that I have is about PREEMPT_RT kernel which cannot
>> tolerate any locking stall. That will probably require disabling
>> rqspinlock if CONFIG_PREEMPT_RT is enabled.
> I think rqspinlock better maps to the raw spin lock variants, which
> stays as a spin lock on RT kernels, and as you see in patch 17 and 18,
> BPF maps were already using the raw spin lock variants. To avoid
> stalling, we perform deadlock checks immediately when we enter the
> slow path, so for the cases where we rely upon rqspinlock to diagnose
> and report an error, we'll recover quickly. If we still hit the
> timeout it is probably a different problem / bug anyway (and would
> have caused a kernel hang otherwise).

Is the intention to only replace raw_spinlock_t by rqspinlock but never 
spinlock_t? Again, this information need to be documented. Looking at 
the pdf file, it looks like the rqspinlock usage will be extended over time.

As for the locking semantics allowed by the BPF verifier, is it possible 
to enforce the strict locking rules for PREEMPT_RT kernel and use the 
relaxed semantics for non-PREEMPT_RT kernel. We don't want the loading 
of an arbitrary BPF program to break the latency guarantee of a 
PREEMPT_RT kernel.

Cheers,
Longman




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ