[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPLW+4=kuHze3=+g80CsY6OkLno5gyjRfMWLXTFHu3N_=XcmqA@mail.gmail.com>
Date: Mon, 25 Nov 2024 19:37:00 -0600
From: Sam Protsenko <semen.protsenko@...aro.org>
To: Christoph Hellwig <hch@....de>
Cc: axboe@...nel.dk, linux-block@...r.kernel.org, linux-scsi@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] block: remove the ioprio field from struct request
Hi Christoph,
On Mon, Nov 25, 2024 at 1:37 AM Christoph Hellwig <hch@....de> wrote:
>
> On Fri, Nov 22, 2024 at 03:55:23PM -0600, Sam Protsenko wrote:
> > It's an Exynos based board with eMMC, so it uses DW MMC driver, with
> > Exynos glue layer on top of it, so:
> >
> > drivers/mmc/host/dw_mmc.c
> > drivers/mmc/host/dw_mmc-exynos.c
> >
> > I'm using the regular ARM64 defconfig. Nothing fancy about this setup
> > neither, the device tree with eMMC definition (mmc_0) is here:
> >
> > arch/arm64/boot/dts/exynos/exynos850-e850-96.dts
>
> Thanks. eMMC itself never looks at the ioprio field.
>
> > FWIW, I was able to narrow down the issue to dd_insert_request()
> > function. With this hack the freeze is gone:
>
> Sounds like it isn't the driver that matters here, but the scheduler.
>
> >
> > 8<-------------------------------------------------------------------->8
> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> > index acdc28756d9d..83d272b66e71 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -676,7 +676,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx
> > *hctx, struct request *rq,
> > struct request_queue *q = hctx->queue;
> > struct deadline_data *dd = q->elevator->elevator_data;
> > const enum dd_data_dir data_dir = rq_data_dir(rq);
> > - u16 ioprio = req_get_ioprio(rq);
> > + u16 ioprio = 0; /* the same as old req->ioprio */
> > u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
> > struct dd_per_prio *per_prio;
> > enum dd_prio prio;
> > 8<-------------------------------------------------------------------->8
> >
> > Does it tell you anything about where the possible issue can be?
>
> Can you dump the ioprities you see here with and without the reverted
> patch?
>
Collected the logs for you:
- with patch reverted (ioprio is always 0): [1]
- with patch present: [2]
This code was added for printing the traces:
8<---------------------------------------------------------------------->8
static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
blk_insert_t flags, struct list_head *free)
{
+#define IOPRIO_N_LIMIT 100
+ static int ioprio_prev = 0, ioprio_n = 0;
struct request_queue *q = hctx->queue;
struct deadline_data *dd = q->elevator->elevator_data;
const enum dd_data_dir data_dir = rq_data_dir(rq);
u16 ioprio = req_get_ioprio(rq);
u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
struct dd_per_prio *per_prio;
enum dd_prio prio;
+ ioprio_n++;
+ if (ioprio != ioprio_prev || ioprio_n == IOPRIO_N_LIMIT) {
+ pr_err("### %-20d : %d times\n", ioprio_prev, ioprio_n);
+ ioprio_n = 0;
+ }
+ ioprio_prev = ioprio;
+
lockdep_assert_held(&dd->lock);
8<---------------------------------------------------------------------->8
Specifically I'd pay attention to the next two places in [2], where
the delays were introduced:
1. Starting getty service (5 second delay):
8<---------------------------------------------------------------------->8
[ 14.875199] ### 24580 : 1 times
...
[ OK ] Started getty@...1.service - Getty on tty1.
[ OK ] Started serial-getty@...SAâice - Serial Getty on ttySAC0.
[ OK ] Reached target getty.target - Login Prompts.
[ 19.425354] ### 0 : 100 times
...
8<---------------------------------------------------------------------->8
2. Login (more than 60 seconds delay):
8<---------------------------------------------------------------------->8
runner-vwmj3eza-project-40964107-concurrent-0 login: root
...
[ 22.827432] ### 0 : 100 times
...
[ 100.100402] ### 24580 : 1 times
#
8<---------------------------------------------------------------------->8
I guess the results look similar to the logs from the neighboring
thread [3]. Not sure if those tools (getty service and login tool) are
running ioprio_set() internally, or it's just I/O scheduler acting up,
but the freeze is happening consistently in those two places. Please
let me know if I can help you debug that further. Not a block layer
expert by any means, but that looks like a regression, at least on
E850-96 board. Wonder if it's possible to reproduce that on other
platforms.
Thanks!
[1] https://termbin.com/aus7
[2] https://termbin.com/za3t
[3] https://lore.kernel.org/all/CAP-bSRab1C-_aaATfrgWjt9w0fcYUCQCG7u+TCb1FSPSd6CEaA@mail.gmail.com/
Powered by blists - more mailing lists