[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f5d57e3b-8168-41af-8e36-c7a21ef3a475@grimberg.me>
Date: Sun, 7 Apr 2024 23:08:23 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Kamaljit Singh <Kamaljit.Singh1@....com>,
Chaitanya Kulkarni <chaitanyak@...dia.com>
Cc: "kbusch@...nel.org" <kbusch@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>
Subject: Re: WQ_UNBOUND workqueue warnings from multiple drivers
On 03/04/2024 2:50, Kamaljit Singh wrote:
> Sagi, Chaitanya,
>
> Sorry for the delay, found your replies in the junk folder :(
>
>> Was the test you were running read-heavy?
> No, most of the failing fio tests were doing heavy writes. All were with 8 Controllers and 32 NS each. io-specs are below.
>
> [1] bs=16k, iodepth=16, rwmixread=0, numjobs=16
> Failed in ~1 min
>
> Some others were:
> [2] bs=8k, iodepth=16, rwmixread=5, numjobs=16
> [3] bs=8k, iodepth=16, rwmixread=50, numjobs=16
Interesting, that is the opposite of what I would suspect (I thought that
the workload would be read-only or read-mostly).
Does this happen with a 90-%100% read workload?
If we look at nvme_tcp_io_work() it is essentially looping
doing send() and recv() and every iteration checks if a 1ms
deadline elapsed. The fact that it happens on a 100% write
workload leads me to conclude that the only way this can
happen if sending a single 16K request to a controller on its
own takes more than 10ms, which is unexpected...
Question, are you working with a Linux controller? what
is the ctrl ioccsz?
Powered by blists - more mailing lists