[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <TYWP286MB226710E14843B461637C4DC6B9719@TYWP286MB2267.JPNP286.PROD.OUTLOOK.COM>
Date: Fri, 10 Dec 2021 01:07:26 +0000
From: 小川 修一
<ogawa.syuuichi@...d-mse.com>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: 山本 達史
<yamamoto.tatsushi@...d-mse.com>,
"Natsume, Wataru (ADITJ/SWG)" <wnatsume@...adit-jv.com>,
小川 修一
<ogawa.syuuichi@...d-mse.com>
Subject: RE: invalid counter value in request_queue
I forgot to take care gdb-script about ATOMIC mode in percpu_count_ptr. I fixed the script then I got:
super_block 0xffff9bb3f8bbcc30 "vdb"
q=(struct request_queue *) 0xffff9bb3fb53d1c0,q->q_usage_counter.percpu_count_ptr=(unsigned long *) 0x3a80c000b2b9
0:0xffffd634bfc0b2b8,0x0,0
1:0xffffd634bfc8b2b8,0x0,0
2:0xffffd634bfd0b2b8,0x0,0
Result is as same as previous my e-mail.
-----Original Message-----
From: 小川 修一
Sent: Friday, December 10, 2021 9:54 AM
To: linux-kernel@...r.kernel.org
Cc: 山本 達史 <yamamoto.tatsushi@...d-mse.com>; Natsume, Wataru (ADITJ/SWG) <wnatsume@...adit-jv.com>
Subject: RE: invalid counter value in request_queue
Hi all
I understood the reason why per-cpu couner became negative.
As I mention previous mail, q_usage_counter should be atomic counter.
percpu_refcount has not only percpu counter but also ATOMIC counter mode.
As fact, blk_alloc_queue_node() initialize request_queue->q_usage_counter as ATOMIC_MODE like this:
/*
* Init percpu_ref in atomic mode so that it's faster to shutdown.
* See blk_register_queue() for details.
*/
if (percpu_ref_init(&q->q_usage_counter,
blk_queue_usage_counter_release,
PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
However, q_usage_counter becomes percpu mode at blk_register_queue() .
blk_register_queue()
/*
* SCSI probing may synchronously create and destroy a lot of
* request_queues for non-existent devices. Shutting down a fully
* functional queue takes measureable wallclock time as RCU grace
* periods are involved. To avoid excessive latency in these
* cases, a request_queue starts out in a degraded mode which is
* faster to shut down and is made fully functional here as
* request_queues for non-existent devices never get registered.
*/
if (!blk_queue_init_done(q)) {
queue_flag_set_unlocked(QUEUE_FLAG_INIT_DONE, q);
percpu_ref_switch_to_percpu(&q->q_usage_counter);
blk_queue_bypass_end(q);
}
When I remove percpu_ref_switch_to_percpu(&q->q_usage_counter) in blk_register_queue() ;
- At least percpu counter is no longer used.
super_block 0xffff9bb3f8bbcc30 "vdb"
q=(struct request_queue *) 0xffff9bb3fb53d1c0,q->q_usage_counter.percpu_count_ptr=(unsigned long *) 0x3a80c000b2b9
0:0xffffd634bfc0b2b9,0x0,0
1:0xffffd634bfc8b2b9,0x0,0
2:0xffffd634bfd0b2b9,0x0,0
You can find percpu_count_ptr has a bit __PERCPU_REF_ATOMIC=0x1 per cpu counter looks not used
- I check again q_usage_counter.
crash> p $q->q_usage_counter
$1 = {
count = {
counter = 5
},
percpu_count_ptr = 64324651496121,
release = 0xffffffff9d44445d,
confirm_switch = 0x0,
force_atomic = true,
rcu = {
next = 0x0,
func = 0x0
}
}
atomic_t count is 5.
5 is OK or NG ? I will check it.
Other block device which ext4 mount on has counter=1
Now question is
Why blk_register_queue() turn q_usage_counter to percpu mode ?
I think this code has some reason to turn to percpu mode, though blk-core.c treat that counter as atomic counter.
Does it have some problem ?
Best Regards, shuichi ogawa
-----Original Message-----
From: 小川 修一 <ogawa.syuuichi@...d-mse.com>
Sent: Thursday, December 9, 2021 1:50 PM
To: linux-kernel@...r.kernel.org
Cc: 小川 修一 <ogawa.syuuichi@...d-mse.com>; 山本 達史 <yamamoto.tatsushi@...d-mse.com>; Natsume, Wataru (ADITJ/SWG) <wnatsume@...adit-jv.com>
Subject: invalid counter value in request_queue
Hi, all
I have first time to post mail, so if you have some matter, please let me know.
I'm studying Linux kernel, referencing kdump, to clarify system performance problem.
Now I found strange value in request_queue->q_usage_counter.percpu_count_ptr
Kernel version: 4.9.52, I checked 5.10.80 briefly, and looked similar.
super_block 0xffff9a563820e430 "vdb"
q=(struct request_queue *) 0xffff9a563b948920,q->q_usage_counter.percpu_count_ptr=(unsigned long *) 0x39dbc000b2b8
0:0xffffd431ffc0b2b8,0xffffffffffffdaf1,-9487
1:0xffffd431ffc8b2b8,0x0,0
2:0xffffd431ffd0b2b8,0x2510,9488
This is output of gdb script in crash commadn. Format is <cpu>:<address>,<value in HEX>, <value in signed long DEC>
Values of percpu_counter_ptr, big value or negative value on cpu0, and positive value on cpu2.
If sum up all cpus, total=1, it means 1 request remain in /dev/vdb at that kdump.
Easy to estimate, count up cpu and count down cpu are different.
I think the q_usage_counter doesn't work as reference counter to guard invalid disposing request queue, however I don't found to use this counter.
System looks no problem. However I wonder that causes any troubles like invalid disposing resource.
I ask you that this is really no problem at all.
---
As we know, this counter is reference counter of request queue access, For example generic_make_request
blk_queue_enter(q, false) -> percpu_ref_tryget_live(&q->q_usage_counter) : count up
blk_queue_exit(q) -> percpu_ref_put(&q->q_usage_counter) : count down
If count up on cpu2, and count down on cpu0, this counter becomes invalid.
I found 2 cases:
case-1: normal case of counting actual requested I/O
blk_mq_map_request() request bio to block device, then count up in blk_queue_enter_live(q)
blk_mq_end_request() called at terminating I/O at IRQ context, then count down in
blk_mq_free_request() -> blk_queue_exit(q)
IRQ context is normally run on cpu0 in my system. so If AP requests FILE-I/O on cpu2, this problem is reproduced.
case-2: preemption
generic_make_request is not preempt disabled, then cpu may changes between blk_queue_enter and blk_queue_exit.
Now I think q_usage_counter should consist of simple atomic_t or kref_t instead of percpu_ref.
System looks no problem as of now, I've not yet make any patches to correct it.
If I have a chance to make the patch, I will post again.
Best Regards, shuichi ogawa, NTT-Data MSE corporation
Powered by blists - more mailing lists