lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f8718e12-bb19-5c50-a825-3ea82399ae7e@huaweicloud.com>
Date:   Thu, 12 Jan 2023 14:14:34 +0800
From:   Hou Tao <houtao@...weicloud.com>
To:     Jingbo Xu <jefflexu@...ux.alibaba.com>, linux-cachefs@...hat.com
Cc:     David Howells <dhowells@...hat.com>,
        Jeff Layton <jlayton@...nel.org>, linux-erofs@...ts.ozlabs.org,
        linux-kernel@...r.kernel.org, houtao1@...wei.com
Subject: Re: [PATCH v2 1/2] fscache: Use wait_on_bit() to wait for the freeing
 of relinquished volume

Hi,

On 1/12/2023 11:47 AM, Jingbo Xu wrote:
>
> On 12/26/22 6:33 PM, Hou Tao wrote:
>> From: Hou Tao <houtao1@...wei.com>
>>
>> The freeing of relinquished volume will wake up the pending volume
>> acquisition by using wake_up_bit(), however it is mismatched with
>> wait_var_event() used in fscache_wait_on_volume_collision() and it will
>> never wake up the waiter in the wait-queue because these two functions
>> operate on different wait-queues.
>>
>> According to the implementation in fscache_wait_on_volume_collision(),
>> if the wake-up of pending acquisition is delayed longer than 20 seconds
>> (e.g., due to the delay of on-demand fd closing), the first
>> wait_var_event_timeout() will timeout and the following wait_var_event()
>> will hang forever as shown below:
>>
>>  FS-Cache: Potential volume collision new=00000024 old=00000022
>>  ......
>>  INFO: task mount:1148 blocked for more than 122 seconds.
>>        Not tainted 6.1.0-rc6+ #1
>>  task:mount           state:D stack:0     pid:1148  ppid:1
>>  Call Trace:
>>   <TASK>
>>   __schedule+0x2f6/0xb80
>>   schedule+0x67/0xe0
>>   fscache_wait_on_volume_collision.cold+0x80/0x82
>>   __fscache_acquire_volume+0x40d/0x4e0
>>   erofs_fscache_register_volume+0x51/0xe0 [erofs]
>>   erofs_fscache_register_fs+0x19c/0x240 [erofs]
>>   erofs_fc_fill_super+0x746/0xaf0 [erofs]
>>   vfs_get_super+0x7d/0x100
>>   get_tree_nodev+0x16/0x20
>>   erofs_fc_get_tree+0x20/0x30 [erofs]
>>   vfs_get_tree+0x24/0xb0
>>   path_mount+0x2fa/0xa90
>>   do_mount+0x7c/0xa0
>>   __x64_sys_mount+0x8b/0xe0
>>   do_syscall_64+0x30/0x60
>>   entry_SYSCALL_64_after_hwframe+0x46/0xb0
>>
>> Considering that wake_up_bit() is more selective, so fixing it by using
> 							^
> 						       fix
>> wait_on_bit() instead of wait_var_event() to wait for the freeing of
>> relinquished volume. In addition because waitqueue_active() is used in
>> wake_up_bit() and clear_bit() doesn't imply any memory barrier, so also
>> adding smp_mb__after_atomic() before wake_up_bit().
> ... doesn't imply any memory barrier, add ...
Thanks for suggestions above. Will update in v3.
>
>> Fixes: 62ab63352350 ("fscache: Implement volume registration")
>> Signed-off-by: Hou Tao <houtao1@...wei.com>
>
> Otherwise LGTM :)
>
> Reviewed-by: Jingbo Xu <jefflexu@...ux.alibaba.com>
Thanks for review.
>
>> ---
>>  fs/fscache/volume.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
>> index ab8ceddf9efa..fc3dd3bc851d 100644
>> --- a/fs/fscache/volume.c
>> +++ b/fs/fscache/volume.c
>> @@ -141,13 +141,14 @@ static bool fscache_is_acquire_pending(struct fscache_volume *volume)
>>  static void fscache_wait_on_volume_collision(struct fscache_volume *candidate,
>>  					     unsigned int collidee_debug_id)
>>  {
>> -	wait_var_event_timeout(&candidate->flags,
>> -			       !fscache_is_acquire_pending(candidate), 20 * HZ);
>> +	wait_on_bit_timeout(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING,
>> +			    TASK_UNINTERRUPTIBLE, 20 * HZ);
>>  	if (fscache_is_acquire_pending(candidate)) {
>>  		pr_notice("Potential volume collision new=%08x old=%08x",
>>  			  candidate->debug_id, collidee_debug_id);
>>  		fscache_stat(&fscache_n_volumes_collision);
>> -		wait_var_event(&candidate->flags, !fscache_is_acquire_pending(candidate));
>> +		wait_on_bit(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING,
>> +			    TASK_UNINTERRUPTIBLE);
>>  	}
>>  }
>>  
>> @@ -348,6 +349,11 @@ static void fscache_wake_pending_volume(struct fscache_volume *volume,
>>  		if (fscache_volume_same(cursor, volume)) {
>>  			fscache_see_volume(cursor, fscache_volume_see_hash_wake);
>>  			clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
>> +			/*
>> +			 * Paired with barrier in wait_on_bit(). Check
>> +			 * wake_up_bit() and waitqueue_active() for details.
>> +			 */
>> +			smp_mb__after_atomic();
>>  			wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
>>  			return;
>>  		}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ