[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <159efb9b-87df-151f-28df-42407592ea3f@redhat.com>
Date: Wed, 24 Apr 2019 13:10:17 -0400
From: Waiman Long <longman@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...hat.com>,
Will Deacon <will.deacon@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Davidlohr Bueso <dave@...olabs.net>,
Tim Chen <tim.c.chen@...ux.intel.com>,
huang ying <huang.ying.caritas@...il.com>
Subject: Re: [PATCH v4 14/16] locking/rwsem: Guard against making count
negative
On 4/24/19 1:01 PM, Peter Zijlstra wrote:
> On Wed, Apr 24, 2019 at 12:49:05PM -0400, Waiman Long wrote:
>> On 4/24/19 3:09 AM, Peter Zijlstra wrote:
>>> On Tue, Apr 23, 2019 at 03:12:16PM -0400, Waiman Long wrote:
>>>> That is true in general, but doing preempt_disable/enable across
>>>> function boundary is ugly and prone to further problems down the road.
>>> We do worse things in this code, and the thing Linus proposes is
>>> actually quite simple, something like so:
>>>
>>> ---
>>> --- a/kernel/locking/rwsem.c
>>> +++ b/kernel/locking/rwsem.c
>>> @@ -912,7 +904,7 @@ rwsem_down_read_slowpath(struct rw_semap
>>> raw_spin_unlock_irq(&sem->wait_lock);
>>> break;
>>> }
>>> - schedule();
>>> + schedule_preempt_disabled();
>>> lockevent_inc(rwsem_sleep_reader);
>>> }
>>>
>>> @@ -1121,6 +1113,7 @@ static struct rw_semaphore *rwsem_downgr
>>> */
>>> inline void __down_read(struct rw_semaphore *sem)
>>> {
>>> + preempt_disable();
>>> if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
>>> &sem->count) & RWSEM_READ_FAILED_MASK)) {
>>> rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE);
>>> @@ -1129,10 +1122,12 @@ inline void __down_read(struct rw_semaph
>>> } else {
>>> rwsem_set_reader_owned(sem);
>>> }
>>> + preempt_enable();
>>> }
>>>
>>> static inline int __down_read_killable(struct rw_semaphore *sem)
>>> {
>>> + preempt_disable();
>>> if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
>>> &sem->count) & RWSEM_READ_FAILED_MASK)) {
>>> if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE)))
>>> @@ -1142,6 +1137,7 @@ static inline int __down_read_killable(s
>>> } else {
>>> rwsem_set_reader_owned(sem);
>>> }
>>> + preempt_enable();
>>> return 0;
>>> }
>>>
>> Making that change will help the slowpath to has less preemption points.
> That doesn't matter, right? Either it blocks or it goes through quickly.
>
> If you're worried about a parituclar spot we can easily put in explicit
> preemption points.
>
>> For an uncontended rwsem, this offers no real benefit. Adding
>> preempt_disable() is more complicated than I originally thought.
> I'm not sure I get your objection?
>
>> Maybe we are too paranoid about the possibility of a large number of
>> preemptions happening just at the right moment. If p is the probably of
>> a preemption in the middle of the inc-check-dec sequence, which I have
>> already moved as close to each other as possible. We are talking a
>> probability of p^32768. Since p will be really small, the compound
>> probability will be infinitesimally small.
> Sure; but we run on many millions of machines every second, so the
> actual accumulated chance of it happening eventually is still fairly
> significant.
>
>> So I would like to not do preemption now for the current patchset. We
>> can restart the discussion later on if there is a real concern that it
>> may actually happen. Please let me know if you still want to add
>> preempt_disable() for the read lock.
> I like provably correct schemes over prayers.
I am fine with adding preempt_disable(). I just want confirmation that
you want to have that.
>
> As you noted, distros don't usually ship with PREEMPT=y and therefore
> will not be bothered much by any of this.
>
> The old scheme basically worked by the fact that the total supported
> reader count was higher than the number of addressable pages in the
> system and therefore the overflow could not happen.
>
> We now transition to number of CPUs, and for that we pay a little price
> with PREEMPT=y kernels. Either that or cmpxchg.
I also thought about switching to a cmpxchg loop for PREEMPT=y kernel.
Let start with just preempt_disable() for now. We can evaluate the
cmpxchg loop alternative later on.
Cheers,
Longman
Powered by blists - more mailing lists