linux-kernel - Re: [PATCH -next RFC v2 3/8] sbitmap: make sure waitqueues are balanced

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <208a49d2-5f29-c48e-206c-260ee3f1d991@huawei.com>
Date:   Fri, 15 Apr 2022 15:07:38 +0800
From:   "yukuai (C)" <yukuai3@...wei.com>
To:     "Li, Ming" <ming4.li@...el.com>, <axboe@...nel.dk>,
        <andriy.shevchenko@...ux.intel.com>, <john.garry@...wei.com>,
        <ming.lei@...hat.com>
CC:     <linux-block@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <yi.zhang@...wei.com>
Subject: Re: [PATCH -next RFC v2 3/8] sbitmap: make sure waitqueues are
 balanced

在 2022/04/15 14:31, Li, Ming 写道:
> 
> 
> On 4/8/2022 3:39 PM, Yu Kuai wrote:
>> Currently, same waitqueue might be woken up continuously:
>>
>> __sbq_wake_up		__sbq_wake_up
>>   sbq_wake_ptr -> assume	0
>> 			 sbq_wake_ptr -> 0
>>   atomic_dec_return
>> 			atomic_dec_return
>>   atomic_cmpxchg -> succeed
>> 			 atomic_cmpxchg -> failed
>> 			  return true
>>
>> 			__sbq_wake_up
>> 			 sbq_wake_ptr
>> 			  atomic_read(&sbq->wake_index) -> still 0
>>   sbq_index_atomic_inc -> inc to 1
>> 			  if (waitqueue_active(&ws->wait))
>> 			   if (wake_index != atomic_read(&sbq->wake_index))
>> 			    atomic_set -> reset from 1 to 0
>>   wake_up_nr -> wake up first waitqueue
>> 			    // continue to wake up in first waitqueue
>>
>> What's worse, io hung is possible in theory because wake up might be
>> missed. For example, 2 * wake_batch tags are put, while only wake_batch
>> threads are worken:
>>
>> __sbq_wake_up
>>   atomic_cmpxchg -> reset wait_cnt
>> 			__sbq_wake_up -> decrease wait_cnt
>> 			...
>> 			__sbq_wake_up -> wait_cnt is decreased to 0 again
>> 			 atomic_cmpxchg
>> 			 sbq_index_atomic_inc -> increase wake_index
>> 			 wake_up_nr -> wake up and waitqueue might be empty
>>   sbq_index_atomic_inc -> increase again, one waitqueue is skipped
>>   wake_up_nr -> invalid wake up because old wakequeue might be empty
>>
>> To fix the problem, refactor to make sure waitqueues will be woken up
>> one by one, and also choose the next waitqueue by the number of threads
>> that are waiting to keep waitqueues balanced.
> Hi, do you think that updating wake_index before atomic_cmpxchg(ws->wait_cnt) also can solve these two problems?
> like this:
Hi,

The first problem is due to sbq_wake_ptr() is using atomic_set() to
update 'wake_index'.

The second problem is due to __sbq_wake_up() is updating 'wait_cnt'
before 'wait_index'.

> __sbq_wake_up()
> {
> 	....
> 	if (wait_cnt <= 0) {
> 		ret = atomic_cmpxchg(sbq->wake_index, old_wake_index, next_wake_index);
How is the 'next_wake_index' chosen? And the same in sbq_wake_ptr().
> 		if (ret == old_wake_index) {
> 			ret = atomic_cmpxchg(ws->wait_cnt, wait_cnt, wake_batch);

If this failed, just return true with 'wake_index' updated? Then the
caller will call this again, so it seems this can't prevent 'wake_index'
updated multiple times, and 'wait_cnt' in the old 'ws' is not updated.
> 			if (ret == wait_cnt)
> 				wake_up_nr(ws->wait, wake_batch);
> 		}
> 	}
> }
> 
> Your solution is picking the waitqueue with the largest waiters_cnt as the next one to be waked up, I think that waitqueue is possible to starve.
> if lots of threads in a same waitqueue stop waiting before sbq wakes them up, it will cause the waiters_cnt of waitqueue is much less than others, looks like sbq_update_wake_index() would never pick this waitqueue. What do you think? is it possible?

It will be possible if adding threads to waitqueues is not balanced, and
I suppose it's not possible after tag premmption is disabled. However,
instead of chosing the waitqueue with largest waiters_cnt, chosing the
next waitqueue with 'waiters_cnt > 0' might be alternative.

Thanks,
Kuai