linux-kernel - Re: [PATCH v4 14/16] locking/rwsem: Guard against making count negative

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <159efb9b-87df-151f-28df-42407592ea3f@redhat.com>
Date:   Wed, 24 Apr 2019 13:10:17 -0400
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Ingo Molnar <mingo@...hat.com>,
        Will Deacon <will.deacon@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        huang ying <huang.ying.caritas@...il.com>
Subject: Re: [PATCH v4 14/16] locking/rwsem: Guard against making count
 negative

On 4/24/19 1:01 PM, Peter Zijlstra wrote:
> On Wed, Apr 24, 2019 at 12:49:05PM -0400, Waiman Long wrote:
>> On 4/24/19 3:09 AM, Peter Zijlstra wrote:
>>> On Tue, Apr 23, 2019 at 03:12:16PM -0400, Waiman Long wrote:
>>>> That is true in general, but doing preempt_disable/enable across
>>>> function boundary is ugly and prone to further problems down the road.
>>> We do worse things in this code, and the thing Linus proposes is
>>> actually quite simple, something like so:
>>>
>>> ---
>>> --- a/kernel/locking/rwsem.c
>>> +++ b/kernel/locking/rwsem.c
>>> @@ -912,7 +904,7 @@ rwsem_down_read_slowpath(struct rw_semap
>>>  			raw_spin_unlock_irq(&sem->wait_lock);
>>>  			break;
>>>  		}
>>> -		schedule();
>>> +		schedule_preempt_disabled();
>>>  		lockevent_inc(rwsem_sleep_reader);
>>>  	}
>>>  
>>> @@ -1121,6 +1113,7 @@ static struct rw_semaphore *rwsem_downgr
>>>   */
>>>  inline void __down_read(struct rw_semaphore *sem)
>>>  {
>>> +	preempt_disable();
>>>  	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
>>>  			&sem->count) & RWSEM_READ_FAILED_MASK)) {
>>>  		rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE);
>>> @@ -1129,10 +1122,12 @@ inline void __down_read(struct rw_semaph
>>>  	} else {
>>>  		rwsem_set_reader_owned(sem);
>>>  	}
>>> +	preempt_enable();
>>>  }
>>>  
>>>  static inline int __down_read_killable(struct rw_semaphore *sem)
>>>  {
>>> +	preempt_disable();
>>>  	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
>>>  			&sem->count) & RWSEM_READ_FAILED_MASK)) {
>>>  		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE)))
>>> @@ -1142,6 +1137,7 @@ static inline int __down_read_killable(s
>>>  	} else {
>>>  		rwsem_set_reader_owned(sem);
>>>  	}
>>> +	preempt_enable();
>>>  	return 0;
>>>  }
>>>  
>> Making that change will help the slowpath to has less preemption points.
> That doesn't matter, right? Either it blocks or it goes through quickly.
>
> If you're worried about a parituclar spot we can easily put in explicit
> preemption points.
>
>> For an uncontended rwsem, this offers no real benefit. Adding
>> preempt_disable() is more complicated than I originally thought.
> I'm not sure I get your objection?
>
>> Maybe we are too paranoid about the possibility of a large number of
>> preemptions happening just at the right moment. If p is the probably of
>> a preemption in the middle of the inc-check-dec sequence, which I have
>> already moved as close to each other as possible. We are talking a
>> probability of p^32768. Since p will be really small, the compound
>> probability will be infinitesimally small.
> Sure; but we run on many millions of machines every second, so the
> actual accumulated chance of it happening eventually is still fairly
> significant.
>
>> So I would like to not do preemption now for the current patchset. We
>> can restart the discussion later on if there is a real concern that it
>> may actually happen. Please let me know if you still want to add
>> preempt_disable() for the read lock.
> I like provably correct schemes over prayers.


I am fine with adding preempt_disable(). I just want confirmation that
you want to have that.


>
> As you noted, distros don't usually ship with PREEMPT=y and therefore
> will not be bothered much by any of this.
>
> The old scheme basically worked by the fact that the total supported
> reader count was higher than the number of addressable pages in the
> system and therefore the overflow could not happen.
>
> We now transition to number of CPUs, and for that we pay a little price
> with PREEMPT=y kernels. Either that or cmpxchg.

I also thought about switching to a cmpxchg loop for PREEMPT=y kernel.
Let start with just preempt_disable() for now. We can evaluate the
cmpxchg loop alternative later on.

Cheers,
Longman