[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <37df66ec9cf1a0570a86ec0b9f17ae18ed11b832.camel@yandex.ru>
Date: Wed, 31 Jul 2024 17:14:33 +0300
From: Konstantin Kharlamov <Hi-Angel@...dex.ru>
To: Yu Kuai <yukuai1@...weicloud.com>, Song Liu <song@...nel.org>,
linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
"yangerkun@...wei.com" <yangerkun@...wei.com>, "yukuai (C)"
<yukuai3@...wei.com>
Cc: dm-devel@...ts.linux.dev, Matthew Sakai <msakai@...hat.com>
Subject: Re: Lockup of (raid5 or raid6) + vdo after taking out a disk under
load
CC'ing VDO maintainers, because the problem is only reproducible with
VDO, so potentially they might have some ideas.
On Mon, 2024-07-22 at 20:56 +0300, Konstantin Kharlamov wrote:
> Hi, sorry for the delay, I had to give away the nodes and we had a
> week
> of teambuilding and company party, so for the past week I only
> managed
> to hack away stripping debug symbols, get another node and set it up.
>
> Experiments below are based off of vanilla 6.9.8 kernel *without*
> your
> patch.
>
> On Mon, 2024-07-15 at 09:56 +0800, Yu Kuai wrote:
> > Line number will be helpful.
>
> So, after tinkering with building scripts I managed to build modules
> with debug symbols (not the kernel itself but should be good enough),
> but for some reason kernel doesn't show line numbers in stacktraces.
> No
> idea what could be causing it, so I had to decode line numbers
> manually, below is an output where I inserted line numbers for
> raid456
> manually after decoding them with `gdb`.
>
> […]
> [ 1677.293366] <TASK>
> [ 1677.293661] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 1677.293972] ? _raw_spin_unlock_irq+0x10/0x30
> [ 1677.294276] ? _raw_spin_unlock_irq+0xa/0x30
> [ 1677.294586] raid5d at drivers/md/raid5.c:6572
> [ 1677.294910] md_thread+0xc1/0x170
> [ 1677.295228] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 1677.295545] ? __pfx_md_thread+0x10/0x10
> [ 1677.295870] kthread+0xff/0x130
> [ 1677.296189] ? __pfx_kthread+0x10/0x10
> [ 1677.296498] ret_from_fork+0x30/0x50
> [ 1677.296810] ? __pfx_kthread+0x10/0x10
> [ 1677.297112] ret_from_fork_asm+0x1a/0x30
> [ 1677.297424] </TASK>
> […]
> [ 1705.296253] <TASK>
> [ 1705.296554] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 1705.296864] ? _raw_spin_unlock_irq+0x10/0x30
> [ 1705.297172] ? _raw_spin_unlock_irq+0xa/0x30
> [ 1677.294586] raid5d at drivers/md/raid5.c:6597
> [ 1705.297794] md_thread+0xc1/0x170
> [ 1705.298099] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 1705.298409] ? __pfx_md_thread+0x10/0x10
> [ 1705.298714] kthread+0xff/0x130
> [ 1705.299022] ? __pfx_kthread+0x10/0x10
> [ 1705.299333] ret_from_fork+0x30/0x50
> [ 1705.299641] ? __pfx_kthread+0x10/0x10
> [ 1705.299947] ret_from_fork_asm+0x1a/0x30
> [ 1705.300257] </TASK>
> […]
> [ 1733.296255] <TASK>
> [ 1733.296556] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 1733.296862] ? _raw_spin_unlock_irq+0x10/0x30
> [ 1733.297170] ? _raw_spin_unlock_irq+0xa/0x30
> [ 1677.294586] raid5d at drivers/md/raid5.c:6572
> [ 1733.297792] md_thread+0xc1/0x170
> [ 1733.298096] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 1733.298403] ? __pfx_md_thread+0x10/0x10
> [ 1733.298711] kthread+0xff/0x130
> [ 1733.299018] ? __pfx_kthread+0x10/0x10
> [ 1733.299330] ret_from_fork+0x30/0x50
> [ 1733.299637] ? __pfx_kthread+0x10/0x10
> [ 1733.299943] ret_from_fork_asm+0x1a/0x30
> [ 1733.300251] </TASK>
>
> > Meanwhile, can you check if the underlying
> > disks has IO while raid5 stuck, by /sys/block/[device]/inflight.
>
> The two devices that are left after the 3rd one is removed has these
> numbers that don't change with time:
>
> [Mon Jul 22 20:18:06 @ ~]:> for d in dm-19 dm-17; do echo -n $d;
> cat
> /sys/block/$d/inflight; done
> dm-19 9 1
> dm-17 11 2
> [Mon Jul 22 20:18:11 @ ~]:> for d in dm-19 dm-17; do echo -n $d;
> cat
> /sys/block/$d/inflight; done
> dm-19 9 1
> dm-17 11 2
>
> They also don't change after I return the disk back (which is to be
> expected I guess, given that the lockup doesn't go away).
>
> > >
> > > > At first, can the problem reporduce with raid1/raid10? If not,
> > > > this
> > > > is
> > > > probably a raid5 bug.
> > >
> > > This is not reproducible with raid1 (i.e. no lockups for raid1),
> > > I
> > > tested that. I didn't test raid10, if you want I can try (but
> > > probably
> > > only after the weekend, because today I was asked to give the
> > > nodes
> > > away, for the weekend at least, to someone else).
> >
> > Yes, please try raid10 as well. For now I'll say this is a raid5
> > problem.
>
> Tested: raid10 works just fine, i.e. no lockup and fio continues
> having non-zero IOPS.
>
> > > > The best will be that if I can reporduce this problem myself.
> > > > The problem is that I don't understand the step 4: turning off
> > > > jbod
> > > > slot's power, is this only possible for a real machine, or can
> > > > I
> > > > do
> > > > this in my VM?
> > >
> > > Well, let's say that if it is possible, I don't know a way to do
> > > that.
> > > The `sg_ses` commands that I used
> > >
> > > sg_ses --dev-slot-num=9 --set=3:4:1 /dev/sg26 #
> > > turning
> > > off
> > > sg_ses --dev-slot-num=9 --clear=3:4:1 /dev/sg26 #
> > > turning
> > > on
> > >
> > > …sets and clears the value of the 3:4:1 bit, where the bit is
> > > defined
> > > by the JBOD's manufacturer datasheet. The 3:4:1 specifically is
> > > defined
> > > by "AIC" manufacturer. That means the command as is unlikely to
> > > work on
> > > a different hardware.
> >
> > I never do this before, I'll try.
> > >
> > > Well, while on it, do you have any thoughts why just using a
> > > `echo
> > > 1 >
> > > /sys/block/sdX/device/delete` doesn't reproduce it? Does perhaps
> > > kernel
> > > not emulate device disappearance too well?
> >
> > echo 1 > delete just delete the disk from kernel, and scsi/dm-raid
> > will
> > know that this disk is deleted. However, the disk will stay in
> > kernel
> > for the other way, dm-raid does not aware that underlying disks are
> > problematic and IO will still be generated and issued.
> >
> > Thanks,
> > Kuai
Powered by blists - more mailing lists