[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4517d754-1594-9913-cf7f-da58a7cb23a8@grimberg.me>
Date: Thu, 27 Apr 2023 15:11:25 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Li Feng <fengli@...rtx.com>
Cc: Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
Christoph Hellwig <hch@....de>,
"open list:NVM EXPRESS DRIVER" <linux-nvme@...ts.infradead.org>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] nvme/tcp: Add support to set the tcp worker cpu affinity
> Hi Sagi,
>
> On Wed, Apr 19, 2023 at 5:32 PM Sagi Grimberg <sagi@...mberg.me> wrote:
>>
>>
>>>> Hey Li,
>>>>
>>>>> The default worker affinity policy is using all online cpus, e.g. from 0
>>>>> to N-1. However, some cpus are busy for other jobs, then the nvme-tcp will
>>>>> have a bad performance.
>>>>> This patch adds a module parameter to set the cpu affinity for the nvme-tcp
>>>>> socket worker threads. The parameter is a comma separated list of CPU
>>>>> numbers. The list is parsed and the resulting cpumask is used to set the
>>>>> affinity of the socket worker threads. If the list is empty or the
>>>>> parsing fails, the default affinity is used.
>>>>
>>>> I can see how this may benefit a specific set of workloads, but I have a
>>>> few issues with this.
>>>>
>>>> - This is exposing a user interface for something that is really
>>>> internal to the driver.
>>>>
>>>> - This is something that can be misleading and could be tricky to get
>>>> right, my concern is that this would only benefit a very niche case.
>>> Our storage products needs this feature~
>>> If the user doesn’t know what this is, they can keep it default, so I thinks this is
>>> not unacceptable.
>>
>> It doesn't work like that. A user interface is not something exposed to
>> a specific consumer.
>>
>>>> - If the setting should exist, it should not be global.
>>> V2 has fixed it.
>>>>
>>>> - I prefer not to introduce new modparams.
>>>>
>>>> - I'd prefer to find a way to support your use-case without introducing
>>>> a config knob for it.
>>>>
>>> I’m looking forward to it.
>>
>> If you change queue_work_on to queue_work, ignoring the io_cpu, does it
>> address your problem?
> Sorry for the late response, I just got my machine back.
> Replace the queue_work_on to queue_work, looks like it has a little
> good performance.
> The busy worker is `kworker/56:1H+nvme_tcp_wq`, and fio binds to
> 90('cpus_allowed=90'),
> I don't know why the worker 56 is selected.
> The performance of 256k read up from 1.15GB/s to 1.35GB/s.
The question becomes what would be the impact for multi-threaded
workloads and different NIC/CPU/App placements... This is the
tricky part of touching this stuff.
>> Not saying that this should be a solution though.
>>
>> How many queues does your controller support that you happen to use
>> queue 0 ?
> Our controller only support one io queue currently.
I don't think I ever heard of a fabrics controller that supports
a single io queue.
>>
>> Also, what happens if you don't pin your process to a specific cpu, does
>> that change anything?
> If I don't pin the cpu, the performance has no effect.
Which again, makes this optimization point a niche.
Powered by blists - more mailing lists