netdev - Re: [PATCH net] net: ibmveth: make veth_pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <874iz8au2c.fsf@linux.ibm.com>
Date: Tue, 01 Apr 2025 08:29:47 -0500
From: Dave Marquardt <davemarq@...ux.ibm.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        Nick Child
 <nnac123@...ux.ibm.com>
Subject: Re: [PATCH net] net: ibmveth: make veth_pool_store stop hanging

Paolo Abeni <pabeni@...hat.com> writes:

> On 3/31/25 11:23 PM, davemarq@...ux.ibm.com wrote:
>> From: Dave Marquardt <davemarq@...ux.ibm.com>
>> 
>> Use rtnl_mutex to synchronize veth_pool_store with itself,
>> ibmveth_close and ibmveth_open, preventing multiple calls in a row to
>> napi_disable.
>> 
>> Signed-off-by: Dave Marquardt <davemarq@...ux.ibm.com>
>> Fixes: 860f242eb534 ("[PATCH] ibmveth change buffer pools dynamically")
>> Reviewed-by: Nick Child <nnac123@...ux.ibm.com>
>> ---
>> In working on removing BUG_ON calls from ibmveth, I realized that 2
>> threads could call veth_pool_store through writing to
>> /sys/devices/vio/30000002/pool*/*. You can do this easily with a little
>> shell script.
>> 
>> Running on a 6.14 kernel, I saw a hang:
>> 
>>     [  243.683282][  T108] INFO: task stress.sh:5829 blocked for more than 122 seconds.
>>     [  243.683300][  T108]       Not tainted 6.14.0-01103-g2df0c02dab82 #3
>>     [  243.683303][  T108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>     [  366.563278][  T108] INFO: task stress.sh:5829 blocked for more than 245 seconds.
>>     [  366.563297][  T108]       Not tainted 6.14.0-01103-g2df0c02dab82 #3
>>     [  366.563301][  T108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> 
>> I configured LOCKDEP, compiled ibmveth.c with DEBUG, and built a new
>> kernel. I ran the test again and saw:
>> 
>>     Setting pool0/active to 0
>>     Setting pool1/active to 1
>>     [   73.911067][ T4365] ibmveth 30000002 eth0: close starting
>>     Setting pool1/active to 1
>>     Setting pool1/active to 0
>>     [   73.911367][ T4366] ibmveth 30000002 eth0: close starting
>>     [   73.916056][ T4365] ibmveth 30000002 eth0: close complete
>>     [   73.916064][ T4365] ibmveth 30000002 eth0: open starting
>>     [  110.808564][  T712] systemd-journald[712]: Sent WATCHDOG=1 notification.
>>     [  230.808495][  T712] systemd-journald[712]: Sent WATCHDOG=1 notification.
>>     [  243.683786][  T123] INFO: task stress.sh:4365 blocked for more than 122 seconds.
>>     [  243.683827][  T123]       Not tainted 6.14.0-01103-g2df0c02dab82-dirty #8
>>     [  243.683833][  T123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>     [  243.683838][  T123] task:stress.sh       state:D stack:28096 pid:4365  tgid:4365  ppid:4364   task_flags:0x400040 flags:0x00042000
>>     [  243.683852][  T123] Call Trace:
>>     [  243.683857][  T123] [c00000000c38f690] [0000000000000001] 0x1 (unreliable)
>>     [  243.683868][  T123] [c00000000c38f840] [c00000000001f908] __switch_to+0x318/0x4e0
>>     [  243.683878][  T123] [c00000000c38f8a0] [c000000001549a70] __schedule+0x500/0x12a0
>>     [  243.683888][  T123] [c00000000c38f9a0] [c00000000154a878] schedule+0x68/0x210
>>     [  243.683896][  T123] [c00000000c38f9d0] [c00000000154ac80] schedule_preempt_disabled+0x30/0x50
>>     [  243.683904][  T123] [c00000000c38fa00] [c00000000154dbb0] __mutex_lock+0x730/0x10f0
>>     [  243.683913][  T123] [c00000000c38fb10] [c000000001154d40] napi_enable+0x30/0x60
>>     [  243.683921][  T123] [c00000000c38fb40] [c000000000f4ae94] ibmveth_open+0x68/0x5dc
>>     [  243.683928][  T123] [c00000000c38fbe0] [c000000000f4aa20] veth_pool_store+0x220/0x270
>>     [  243.683936][  T123] [c00000000c38fc70] [c000000000826278] sysfs_kf_write+0x68/0xb0
>>     [  243.683944][  T123] [c00000000c38fcb0] [c0000000008240b8] kernfs_fop_write_iter+0x198/0x2d0
>>     [  243.683951][  T123] [c00000000c38fd00] [c00000000071b9ac] vfs_write+0x34c/0x650
>>     [  243.683958][  T123] [c00000000c38fdc0] [c00000000071bea8] ksys_write+0x88/0x150
>>     [  243.683966][  T123] [c00000000c38fe10] [c0000000000317f4] system_call_exception+0x124/0x340
>>     [  243.683973][  T123] [c00000000c38fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
>>     ...
>>     [  243.684087][  T123] Showing all locks held in the system:
>>     [  243.684095][  T123] 1 lock held by khungtaskd/123:
>>     [  243.684099][  T123]  #0: c00000000278e370 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x50/0x248
>>     [  243.684114][  T123] 4 locks held by stress.sh/4365:
>>     [  243.684119][  T123]  #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150
>>     [  243.684132][  T123]  #1: c000000041aea888 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0
>>     [  243.684143][  T123]  #2: c0000000366fb9a8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0
>>     [  243.684155][  T123]  #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_enable+0x30/0x60
>>     [  243.684166][  T123] 5 locks held by stress.sh/4366:
>>     [  243.684170][  T123]  #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150
>>     [  243.684183][  T123]  #1: c00000000aee2288 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0
>>     [  243.684194][  T123]  #2: c0000000366f4ba8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0
>>     [  243.684205][  T123]  #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_disable+0x30/0x60
>>     [  243.684216][  T123]  #4: c0000003ff9bbf18 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0x138/0x12a0
>> 
>> From the ibmveth debug, two threads are calling veth_pool_store, which
>> calls ibmveth_close and ibmveth_open. Here's the sequence:
>> 
>>   T4365             T4366             
>>   ----------------- ----------------- ---------
>>   veth_pool_store   veth_pool_store   
>>                     ibmveth_close     
>>   ibmveth_close                       
>>   napi_disable                        
>>                     napi_disable      
>>   ibmveth_open                        
>>   napi_enable                         <- HANG
>> 
>> ibmveth_close calls napi_disable at the top and ibmveth_open calls
>> napi_enable at the top.
>> 
>> https://docs.kernel.org/networking/napi.html]] says
>> 
>>   The control APIs are not idempotent. Control API calls are safe
>>   against concurrent use of datapath APIs but an incorrect sequence of
>>   control API calls may result in crashes, deadlocks, or race
>>   conditions. For example, calling napi_disable() multiple times in a
>>   row will deadlock.
>> 
>> In the normal open and close paths, rtnl_mutex is acquired to prevent
>> other callers. This is missing from veth_pool_store. Use rtnl_mutex in
>> veth_pool_store fixes these hangs.
>
> Some/most of the above should actually land into the commit message,
> please rewrite it accordingly.

Okay, will do for v2.

>>  drivers/net/ethernet/ibm/ibmveth.c | 27 +++++++++++++++++++++++----
>>  1 file changed, 23 insertions(+), 4 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>> index b619a3ec245b..77ef19a53e72 100644
>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> @@ -1802,18 +1802,24 @@ static ssize_t veth_pool_store(struct kobject *kobj, struct attribute *attr,
>>  	long value = simple_strtol(buf, NULL, 10);
>>  	long rc;
>>  
>> +	rtnl_lock();
>> +
>>  	if (attr == &veth_active_attr) {
>>  		if (value && !pool->active) {
>>  			if (netif_running(netdev)) {
>>  				if (ibmveth_alloc_buffer_pool(pool)) {
>>  					netdev_err(netdev,
>>  						   "unable to alloc pool\n");
>> +					rtnl_unlock();
>>  					return -ENOMEM;
>>  				}
>>  				pool->active = 1;
>>  				ibmveth_close(netdev);
>> -				if ((rc = ibmveth_open(netdev)))
>> +				rc = ibmveth_open(netdev);
>> +				if (rc) {
>> +					rtnl_unlock();
>>  					return rc;
>
> If you avoid a bit of duplicate code with
> 					goto unlock_err;
>
> // at the end of the function
> unlock_err:
> 	rtnl_unlock();
> 	return rc;

Yes, I'll do this for v2. Thanks.

-Dave