[<prev] [next>] [day] [month] [year] [list]
Message-ID: <5d73d162-e860-785f-a775-1100b0aa57dd@colorfullife.com>
Date:   Sat, 15 May 2021 07:41:51 +0200
From:   Manfred Spraul <manfred@...orfullife.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     Davidlohr Bueso <dbueso@...e.de>,
        Matthias von Faber <matthias.vonfaber@...-tech.de>,
        Varad Gautam <varad.gautam@...e.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Oleg Nesterov <oleg@...hat.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] ipc/mqueue: avoid sleep after wakeup
On 5/15/21 6:06 AM, Hillf Danton wrote:
> On Fri, 14 May 2021 17:51:47 +0200  Manfred Spraul wrote:
>> On 5/14/21 5:01 AM, Hillf Danton wrote:
>>> The pipeline waker could start doing its job once waiter releases lock and
>>> get the work done before waiter takes a nap, so check wait condition before
>>> sleep to avoid waiting the wakeup that will never come, though that does not
>>> hurt much thanks to timer timeouts like a second.
>> First: The timeout could be infinity, thus the code must not rely on a
>> timeout wakeup.
>>
>> A wrong wait is would be a bug.
>>
>>
>>> Check signal for the same reason.
>>>
>>> Signed-off-by: Hillf Danton <hdanton@...a.com>
>>> ---
>>>
>>> --- y/ipc/mqueue.c
>>> +++ x/ipc/mqueue.c
>>> @@ -710,15 +710,24 @@ static int wq_sleep(struct mqueue_inode_
>>>    		__set_current_state(TASK_INTERRUPTIBLE);
>>>    
>>>    		spin_unlock(&info->lock);
>>> -		time = schedule_hrtimeout_range_clock(timeout, 0,
>>> -			HRTIMER_MODE_ABS, CLOCK_REALTIME);
>>>    
>> I do not see a bug:
>>
>> We do the __set_current_state() while holding the spinlock. If there is
>> a wakeup, then the wakeup will change current->state to TASK_RUNNING.
> Correct.
>> schedule() will not remove us from the run queue when current->state is
>> TASK_RUNNING. The same applies if there are pending signals: schedule()
>> checks for pending signals and sets current->state to TASK_RUNNING.
>>
>> Since the __set_current_state() is done while we hold info->lock, and
>> since the wakeup cannot happen before we have dropped the lock [because
>> the task that wakes us up needs the same lock], I do not see how a
>> wakeup could be lost.
>>
>> Thus: Which issue do you see?
> 	waiter		waker
> 	----		----
> 	unlock
> 			lock
> 	irq		set STATE_READY
> 	softirq		unlock
> 			wakeup
> 	sleep a tick
> 	  schedule();
>
> No need to schedule given READY.
This is not possible to avoid:
	waiter		waker
	----		----
	unlock
	schedule();
	  calls __schedule()
	   <before rq_lock()>
  	                lock
			set STATE_READY
			unlock
			wakeup
			--> set waiter->state = TASK_RUNNING
Now the run queue will be evaluated even though there is strictly speaking no need to do that.
Changes in ipc/sem.c can't solve that: From what I see, the majority of the critical window is in kernel/sched/*.c and not in ipc/sem.c
I do not consider it as useful to add complexity just to reduce the size of a extremely rare event.
Especially: No harm is done. User space can be preempted at any time, so the kernel is always allowed to check the run queue.
--
	Manfred
Powered by blists - more mailing lists
 
