[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhsuW49L8B9K8QFg68v=zG9ywMehUTD18DaG4PexEt-3mzQqQ@mail.gmail.com>
Date: Wed, 24 Jan 2024 16:01:47 -0800
From: Song Liu <song@...nel.org>
To: Dan Moulding <dan@...m.net>, junxiao.bi@...cle.com
Cc: gregkh@...uxfoundation.org, linux-kernel@...r.kernel.org,
linux-raid@...r.kernel.org, regressions@...ts.linux.dev,
stable@...r.kernel.org, yukuai1@...weicloud.com
Subject: Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system;
successfully bisected
Thanks for the information!
On Tue, Jan 23, 2024 at 3:58 PM Dan Moulding <dan@...m.net> wrote:
>
> > This appears the md thread hit some infinite loop, so I would like to
> > know what it is doing. We can probably get the information with the
> > perf tool, something like:
> >
> > perf record -a
> > perf report
>
> Here you go!
>
> # Total Lost Samples: 0
> #
> # Samples: 78K of event 'cycles'
> # Event count (approx.): 83127675745
> #
> # Overhead Command Shared Object Symbol
> # ........ ............... .............................. ..................................................
> #
> 49.31% md0_raid5 [kernel.kallsyms] [k] handle_stripe
> 18.63% md0_raid5 [kernel.kallsyms] [k] ops_run_io
> 6.07% md0_raid5 [kernel.kallsyms] [k] handle_active_stripes.isra.0
> 5.50% md0_raid5 [kernel.kallsyms] [k] do_release_stripe
> 3.09% md0_raid5 [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> 2.48% md0_raid5 [kernel.kallsyms] [k] r5l_write_stripe
> 1.89% md0_raid5 [kernel.kallsyms] [k] md_wakeup_thread
> 1.45% ksmd [kernel.kallsyms] [k] ksm_scan_thread
> 1.37% md0_raid5 [kernel.kallsyms] [k] stripe_is_lowprio
> 0.87% ksmd [kernel.kallsyms] [k] memcmp
> 0.68% ksmd [kernel.kallsyms] [k] xxh64
> 0.56% md0_raid5 [kernel.kallsyms] [k] __wake_up_common
> 0.52% md0_raid5 [kernel.kallsyms] [k] __wake_up
> 0.46% ksmd [kernel.kallsyms] [k] mtree_load
> 0.44% ksmd [kernel.kallsyms] [k] try_grab_page
> 0.40% ksmd [kernel.kallsyms] [k] follow_p4d_mask.constprop.0
> 0.39% md0_raid5 [kernel.kallsyms] [k] r5l_log_disk_error
> 0.37% md0_raid5 [kernel.kallsyms] [k] _raw_spin_lock_irq
> 0.33% md0_raid5 [kernel.kallsyms] [k] release_stripe_list
> 0.31% md0_raid5 [kernel.kallsyms] [k] release_inactive_stripe_list
It appears the thread is indeed doing something. I haven't got luck to
reproduce this on my hosts. Could you please try whether the following
change fixes the issue (without reverting 0de40f76d567)? I will try to
reproduce the issue on my side.
Junxiao,
Please also help look into this.
Thanks,
Song
Powered by blists - more mailing lists