lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Feb 2020 09:06:16 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Jason Gunthorpe <jgg@...pe.ca>
CC:     Leon Romanovsky <leon@...nel.org>,
        Lang Cheng <chenglang@...wei.com>, <dledford@...hat.com>,
        <davem@...emloft.net>, <salil.mehta@...wei.com>,
        <yisen.zhuang@...wei.com>, <linuxarm@...wei.com>,
        <netdev@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
        Saeed Mahameed <saeedm@...lanox.com>,
        <bhaktipriya96@...il.com>, <tj@...nel.org>,
        Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Subject: Re: [RFC rdma-next] RDMA/core: Add attribute WQ_MEM_RECLAIM to
 workqueue "infiniband"

On 2020/2/19 21:04, Jason Gunthorpe wrote:
> On Wed, Feb 19, 2020 at 03:40:59PM +0800, Yunsheng Lin wrote:
>> +cc Bhaktipriya, Tejun and Jeff
>> 
>> On 2020/2/19 14:45, Leon Romanovsky wrote:
>>> On Wed, Feb 19, 2020 at 09:13:23AM +0800, Yunsheng Lin wrote:
>>>> On 2020/2/18 23:31, Jason Gunthorpe wrote:
>>>>> On Tue, Feb 18, 2020 at 11:35:35AM +0800, Lang Cheng wrote:
>>>>>> The hns3 driver sets "hclge_service_task" workqueue with WQ_MEM_RECLAIM flag in order to guarantee forward progress under memory pressure.
>>>>> 
>>>>> Don't do that. WQ_MEM_RECLAIM is only to be used by things interlinked with reclaimed processing.
>>>>> 
>>>>> Work on queues marked with WQ_MEM_RECLAIM can't use GFP_KERNEL allocations, can't do certain kinds of sleeps, can't hold certain kinds of locks, etc.
>> 
>> By the way, what kind of sleeps and locks can not be done in the work queued to wq marked with WQ_MEM_RECLAIM?
> 
> Anything that recurses back into a blocking allocation function.
> 
> If we are freeing memory because an allocation failed (eg GFP_KERNEL) then we cannot go back into a blockable allocation while trying to progress the first failing allocation. That is a deadlock.
> 
> So a WQ cannot hold any locks that enclose GFP_KERNEL in any other threads.
> 
> Unfortunately we don't have a lockdep test for this by default.
> 
>>>> hns3 ethernet driver may be used as the low level transport of a network file system, memory reclaim data path may depend on the worker in hns3 driver to bring back the ethernet link so that it flush the some cache to network based disk.
>>> 
>>> Unlikely that this "network file system" dependency on ethernet link is correct.
>> 
>> Ok, I may be wrong about the above usecase.  but the below commit explicitly state that network devices may be used in memory reclaim path.
> 
> I don't really know how this works when the networking stacks intersect with the block stack.
> 
> Forward progress on something like a NVMeOF requires a lot of stuff to be working, and presumably under reclaim.
> 
> But, we can't make everything WQ_MEM_RECLAIM safe, because we could never do a GFP_KERNEL allocation..
> 
> I have never seen specific guidance what to do here, I assume it is broken.

I assume the forward progress guarantee of network device's wq is broken
too, at least for the case of hns3, fm10k and mlx5 driver.

So I suggest to remove WQ_MEM_RECLAIM for hns3' wq for now.

For now there are two known problems which defeat the forward progress
guarantee of WQ_MEM_RECLAIM  when adding the WQ_MEM_RECLAIM for hns3'
wq:
1. GFP_KERNEL allocation in the hns3' work queued to WQ_MEM_RECLAIM wq.
2. hns3' WQ_MEM_RECLAIM wq flushing infiniband' !WQ_MEM_RECLAIM wq.

We can add the WQ_MEM_RECLAIM back when we have fixed the above problem and
find more specific guidance about handling progress guarantee in network
device's wq .

Thanks for the feedback.

> 
> Jason
> 
> .
> 
f

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ