linux-kernel - Re: [RFC PATCH 0/7] locking/rtqspinlock: Realtime queued spinlocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <342e0af6-43b1-76ea-0f0b-55087dfec96c@redhat.com>
Date:   Thu, 5 Jan 2017 10:55:55 -0500
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>,
        Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: [RFC PATCH 0/7] locking/rtqspinlock: Realtime queued spinlocks

On 01/05/2017 04:44 AM, Peter Zijlstra wrote:
> On Wed, Jan 04, 2017 at 10:25:14AM -0500, Waiman Long wrote:
>> On 01/04/2017 07:49 AM, Peter Zijlstra wrote:
>>> On Tue, Jan 03, 2017 at 01:00:23PM -0500, Waiman Long wrote:
>>>> This patchset introduces a new variant of queued spinlocks - the
>>>> realtime queued spinlocks. The purpose of this new variant is to
>>>> support real spinlock in a realtime environment where high priority
>>>> RT tasks should be allowed to complete its work ASAP. This means as
>>>> little waiting time for spinlocks as possible.
>>>>
>>>> Non-RT tasks will wait for spinlocks in the MCS waiting queue as
>>>> usual. RT tasks and interrupts will spin directly on the spinlocks
>>>> and use the priority value in the pending byte to arbitrate who get
>>>> the lock first.
>>>>
>>>> Patch 1 removes the unused spin_lock_bh_nested() API.
>>>>
>>>> Patch 2 introduces the basic realtime queued spinlocks where the
>>>> pending byte is used for storing the priority of the highest priority
>>>> RT task that is waiting on the spinlock. All the RT tasks will spin
>>>> directly on the spinlock instead of waiting in the queue.
>>>>
>>> OK, so a single numerical field isn't sufficient to describe priority
>>> anymore, since we added DEADLINE support things have gotten a lot more
>>> complex.
>> From what I read from the code, DL tasks all have the same priority that
>> is higher than any of the RT tasks. So you mean DL tasks have other
>> property that kind of categorizing them into different sub-priorities
>> that is not being reflected in their priority level. Is that right?
> Correct, primarily their deadline. That is, the scheduling function for
> the class picks the task with the earliest deadline.

OK, I need to rethink how to deal with those DL tasks.

>>> Also, the whole approach worries me, it has the very real possibility of
>>> re-introducing a bunch of starvation cases avoided by the fair lock.
>> Starvation can happen when there is a constant stream of RT or DL tasks
>> grabbing the lock, or when there is an interrupt storm. However I am
>> making the assumption that RT systems should have sufficient resource
>> available that the RT tasks won't saturate the hardware or we can't have
>> RT guarantee in this case.
> That only works on UP, on SMP you only need a combined utilization of 1
> to completely saturate a lock.

An RT task in a spinlock loop won't be able to completely monopolize the
lock because of the small window between unlock and lock that others can
come in and get the lock. You will need at least 2 RT tasks in lockstep
to completely own the lock and starve the others.

We could implement some kind of policy to increase the dynamic priority
of a task the longer it waits for the lock to make sure that there will
be no lock starvation.

>>> Is there a real problem with -RT that inspired these patches?
>> I know that in -RT kernel, all the non-raw spinlocks are replaced by
>> rtmutex which is a sleeping lock. This can have a real performance
>> impact on systems with more than a few cores. The rtmutex isn't fair either.
>>
>> Do you think it is better to keep the raw spinlocks fair and only have
>> the non-raw spinlocks use the RT version?
> I don't get what you're saying here. Are you proposing to replace the
> rtmutex with this rtspinlock? That will very fundamentally not work. The
> important part of the conversion of spinlock -> rtmutex is acquiring the
> preemptability. Using this rtspinlock looses that and breaks the
> entirety of what -rt is about.

What I am saying that we don't need to change spinlock to rtmutex in a
-RT kernel. Instead, we can use rtqspinlock for this purpose. All the
sleeping locks will still be converted to rtmutex.

Conversion of rtmutex does allow forced CPU preemption when there is a
need for that. What rtqspinlock can provide is voluntary preemption
where the lock waiters explicitly yield the CPU while waiting for the
lock. I use the need_resched() to detect if CPU yielding is necessary.
However, if the CPU was in a preempt disabled region before the
spin_lock() call, we can't yield the CPU. The only way is to raise its
priority and try to get the lock ASAP. I still have some work to do in
this area and I need to figure out how to convey the information about
the priority of the task that is waiting for the CPU.

Cheers,
Longman