[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1155d15f-4188-e5cd-3e4a-6e0c52e9b1eb@huawei.com>
Date: Wed, 19 Feb 2020 15:40:59 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Leon Romanovsky <leon@...nel.org>
CC: Jason Gunthorpe <jgg@...pe.ca>, Lang Cheng <chenglang@...wei.com>,
<dledford@...hat.com>, <davem@...emloft.net>,
<salil.mehta@...wei.com>, <yisen.zhuang@...wei.com>,
<linuxarm@...wei.com>, <netdev@...r.kernel.org>,
<linux-rdma@...r.kernel.org>, Saeed Mahameed <saeedm@...lanox.com>,
<bhaktipriya96@...il.com>, <tj@...nel.org>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Subject: Re: [RFC rdma-next] RDMA/core: Add attribute WQ_MEM_RECLAIM to
workqueue "infiniband"
+cc Bhaktipriya, Tejun and Jeff
On 2020/2/19 14:45, Leon Romanovsky wrote:
> On Wed, Feb 19, 2020 at 09:13:23AM +0800, Yunsheng Lin wrote:
>> On 2020/2/18 23:31, Jason Gunthorpe wrote:
>>> On Tue, Feb 18, 2020 at 11:35:35AM +0800, Lang Cheng wrote:
>>>> The hns3 driver sets "hclge_service_task" workqueue with
>>>> WQ_MEM_RECLAIM flag in order to guarantee forward progress
>>>> under memory pressure.
>>>
>>> Don't do that. WQ_MEM_RECLAIM is only to be used by things interlinked
>>> with reclaimed processing.
>>>
>>> Work on queues marked with WQ_MEM_RECLAIM can't use GFP_KERNEL
>>> allocations, can't do certain kinds of sleeps, can't hold certain
>>> kinds of locks, etc.
By the way, what kind of sleeps and locks can not be done in the work
queued to wq marked with WQ_MEM_RECLAIM?
>>
>> From mlx5 driver, it seems that there is GFP_KERNEL allocations
>> on wq marked with WQ_MEM_RECLAIM too:
>>
>> mlx5e_tx_timeout_work() -> mlx5e_safe_reopen_channels() ->
>> mlx5e_safe_switch_channels() -> mlx5e_open_channels()
>>
>> kcalloc() is called with GFP_KERNEL in mlx5e_open_channels(),
>> and mlx5e_tx_timeout_work() is queued with priv->wq, which is
>> allocated with WQ_MEM_RECLAIM flags. see:
>>
>> mlx5e_netdev_init() -> create_singlethread_workqueue()
>
> There are two reasons for that, first mlx5 driver was written far before
> WQ_MEM_RECLAIM usage was clarified, second mlx5 has bugs.
>
>>
>>
>> From the comment in kernel/workqueue.c, the work queued with
>> wq with WQ_MEM_RECLAIM flag set seems to be executed without
>> blocking under some rare case. I still not quite understand
>> the comment, and I can not find any doc that point out the
>> GFP_KERNEL allocations can not be done in wq with WQ_MEM_RECLAIM
>> yet. Is there any doc that mentions that GFP_KERNEL allocations
>> can not be done in wq with WQ_MEM_RECLAIM?
>
> It is whole purpose of WQ_MEM_RECLAIM flag - allow progress in case of
> memory pressure. Allocation memory while we are under memory pressure
> is an invitation for a disaster.
Ok, make sense.
>
>>
>>
>> /**
>> * rescuer_thread - the rescuer thread function
>> * @__rescuer: self
>> *
>> * Workqueue rescuer thread function. There's one rescuer for each
>> * workqueue which has WQ_MEM_RECLAIM set.
>> *
>> * Regular work processing on a pool may block trying to create a new
>> * worker which uses GFP_KERNEL allocation which has slight chance of
>> * developing into deadlock if some works currently on the same queue
>> * need to be processed to satisfy the GFP_KERNEL allocation. This is
>> * the problem rescuer solves.
>> *
>> * When such condition is possible, the pool summons rescuers of all
>> * workqueues which have works queued on the pool and let them process
>> * those works so that forward progress can be guaranteed.
>> *
>> * This should happen rarely.
>> *
>> * Return: 0
>> */
>>
>>
>> The below is the reason we add the sets "hclge_service_task" workqueue
>> with WQ_MEM_RECLAIM through analysing why other ethernet drivers has
>> allocated wq with WQ_MEM_RECLAIM flag, I may be wrong about that:
>
> Many drivers are developed using copy/paste technique, so it is wrong
> to assume that "other ethernet drivers" did the right thing.
>
>>
>> hns3 ethernet driver may be used as the low level transport of a
>> network file system, memory reclaim data path may depend on the
>> worker in hns3 driver to bring back the ethernet link so that it flush
>> the some cache to network based disk.
>
> Unlikely that this "network file system" dependency on ethernet link is correct.
Ok, I may be wrong about the above usecase.
but the below commit explicitly state that network devices may be used in
memory reclaim path.
0a38c17a21a0 ("fm10k: Remove create_workqueue"):
fm10k: Remove create_workqueue
alloc_workqueue replaces deprecated create_workqueue().
A dedicated workqueue has been used since the workitem (viz
fm10k_service_task, which manages and runs other subtasks) is involved in
normal device operation and requires forward progress under memory
pressure.
create_workqueue has been replaced with alloc_workqueue with max_active
as 0 since there is no need for throttling the number of active work
items.
Since network devices may be used in memory reclaim path,
WQ_MEM_RECLAIM has been set to guarantee forward progress.
flush_workqueue is unnecessary since destroy_workqueue() itself calls
drain_workqueue() which flushes repeatedly till the workqueue
becomes empty. Hence the call to flush_workqueue() has been dropped.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@...il.com>
Acked-by: Tejun Heo <tj@...nel.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@...el.com>
So:
1. Maybe the above commit log is misleading, and network device driver's
wq does not need the WQ_MEM_RECLAIM flag, then maybe document what can
not be done in the work queued to wq marked with WQ_MEM_RECLAIM, and
remove the WQ_MEM_RECLAIM flag for the wq of network device driver.
2. If the network device driver's wq does need the WQ_MEM_RECLAIM flag, then
hns3 may have tow problems here: WQ_MEM_RECLAIM wq flushing !WQ_MEM_RECLAIM
wq problem and GFP_KERNEL allocations in the work queued to WQ_MEM_RECLAIM wq.
>
> Thanks
>
>>
>>>
>>> Jason
>>>
>>>
>>
>
> .
>
Powered by blists - more mailing lists