[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YPfUOKrxGs6FjaOZ@T590>
Date: Wed, 21 Jul 2021 16:00:56 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Oleksandr Natalenko <oleksandr@...alenko.name>
Cc: linux-kernel@...r.kernel.org, Jens Axboe <axboe@...com>,
Christoph Hellwig <hch@....de>,
Sagi Grimberg <sagi@...mberg.me>,
linux-nvme@...ts.infradead.org,
David Jeffery <djeffery@...hat.com>,
Laurence Oberman <loberman@...hat.com>,
Paolo Valente <paolo.valente@...aro.org>,
Jan Kara <jack@...e.cz>, Sasha Levin <sashal@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Keith Busch <kbusch@...nel.org>
Subject: Re: New warning in nvme_setup_discard
On Tue, Jul 20, 2021 at 11:05:29AM +0200, Oleksandr Natalenko wrote:
> Hello, Ming.
>
> On pondělí 19. července 2021 8:27:29 CEST Oleksandr Natalenko wrote:
> > On pondělí 19. července 2021 3:40:40 CEST Ming Lei wrote:
> > > On Sat, Jul 17, 2021 at 02:35:14PM +0200, Oleksandr Natalenko wrote:
> > > > On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote:
> > > > > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote:
> > > > > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote:
> > > > > > > Maybe you need to check if the build is OK, I can't reproduce it
> > > > > > > in
> > > > > > > my
> > > > > > > VM, and BFQ is still builtin:
> > > > > > >
> > > > > > > [root@...st-01 ~]# uname -a
> > > > > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021
> > > > > > > x86_64
> > > > > > > x86_64 x86_64 GNU/Linux [root@...st-01 ~]# cat
> > > > > > > /sys/block/nvme0n1/queue/scheduler
> > > > > > > [none] mq-deadline kyber bfq
> > > > > >
> > > > > > I don't think this is an issue with the build… BTW, with
> > > > > > `initcall_debug`:
> > > > > >
> > > > > > ```
> > > > > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1
> > > > > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507
> > > > > > usecs
> > > > > > ```
> > > > > >
> > > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result
> > > > > > :(.
> > > > >
> > > > > OK, one extra pr_info, and I see this:
> > > > >
> > > > > ```
> > > > > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small
> > > > > [ 0.871612] blkcg_policy_register: -28
> > > > > ```
> > > > >
> > > > > What does it mean please :)? The value seems to be hard-coded:
> > > > >
> > > > > ```
> > > > > include/linux/blkdev.h
> > > > > 60:#define BLKCG_MAX_POLS 5
> > > > > ```
> > > >
> > > > OK, after increasing this to 6 I've got my BFQ back. Please see [1].
> > > >
> > > > [1]
> > > > https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@na
> > > > t
> > > > alenko.name/
> > >
> > > OK, after you fixed the issue in blkcg_policy_register(), can you
> > > reproduce the discard issue on v5.14-rc1 with BFQ applied? If yes,
> > > can you test the patch I posted previously?
> >
> > Yes, the issue is reproducible with both v5.13.2 and v5.14-rc1. I haven't
> > managed to reproduce it with v5.13.2+your patch. Now I will build v5.14-
> > rc2+your patch and test further.
>
> I'm still hammering v5.14-rc2 + your patch, and I cannot reproduce the issue.
> Given I do not have a reliable reproducer (I'm just firing up the kernel build,
> and the issue pops up eventually, sooner or later, but usually within a couple
> of first tries), for how long I should hammer it for your fix to be considered
> proven?
You mentioned that the issue is reproducible with v5.14-rc, that means
it can be always reproduced in limited time(suppose it is A). If the issue
can't be reproduced any more after applying the patch in long enough time B(B >> A),
we can think it is fixed by the patch.
For example, if A is one hour, we can set B as 5*A or bigger to simulate
the long enough time.
Thanks,
Ming
Powered by blists - more mailing lists