netdev - Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5318789B.1040402@mellanox.com>
Date:	Thu, 6 Mar 2014 15:31:07 +0200
From:	Or Gerlitz <ogerlitz@...lanox.com>
To:	Jiri Kosina <jkosina@...e.cz>
CC:	Roland Dreier <roland@...nel.org>, Amir Vadai <amirv@...lanox.com>,
	Eli Cohen <eli@....mellanox.co.il>,
	Eugenia Emantayev <eugenia@...lanox.com>,
	"David S. Miller" <davem@...emloft.net>,
	Mel Gorman <mgorman@...e.de>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>,
	Saeed Mahameed <saeedm@...lanox.com>,
	Sagi Grimberg <sagig@...lanox.com>,
	Shlomo Pongratz <shlomop@...lanox.com>
Subject: Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when
 creating the QP

On 21/02/2014 23:53, Jiri Kosina wrote:
> This was originally a patch from Matthew Finlay<matt@...lanox.com>  that
> addressed a problem whereby NFS writes would enter uninterruptible sleep
> forever.  The issue happened when using NFS over IPoIB. This is not a
> recommended configuration as RDMA is preferred but it is still a valid
> configuration and is important to have in situations where the NFS server
> does not support RDMA. The problem encountered was described as follows:
>
> 	It's not memory reclamation that is the problem as such. There is
> 	an indirect dependency between network filesystems writing back
> 	pages and ipoib_cm_tx_init() due to how a kworker is used. Page
> 	reclaim cannot make forward progress until ipoib_cm_tx_init()
> 	succeeds and it is stuck in page reclaim itself waiting for network
> 	transmission. Ordinarily this sitaution may be avoided by having
> 	the caller use GFP_NOFS but ipoib_cm_tx_init() does not have that information.
>

Hi Jiri,

Reading again (*) the problem description, the team here would be happy 
to clarify with you some details (possibly
few MM newbie questions, but it will help us):

1. just to make sure, the problem happen on the NFS client, not the NFS 
server, right? so writing-back means client
writing over the NFS mount --> network

2. you wrote "due to how a kworker is used", can you clarify if/why 
things go wrong b/c of the kworker usage, or this is matter of phrasing?

in earlier post over this thread you wrote "There was a problem with 
swapping over NFS, as writeback was deadlocked with memory reclaim 
(memory needs to be allocated so that > swap could be accessed to 
reclaim memory). That's fixed by allocating the buffers from PF_MEMALLOC 
reserve, introduced by Mel's and Peter's patchset back in 3.9 or so. Oh, 
and the same has been done for swapping over NBD, btw", in that respect:

3. you mentioned that the memory allocations in ipoib_cm_tx_init() and 
ib_create_qp() --> mlx4 driver requires
page reclaim and waits for network transmission, so this client node put 
their swap over that NFS partition?

4. Can you shed more light, why the problem hits also for kmalloc based 
allocations and not only for vmalloc
based allocation e.g not only b/c of the vzalloc call in 
ipoib_cm_tx_init but rather also b/c of misc kmalloc calls within
the HW (here mlx4) driver?

thanks,

Or.

(*) and sorry for my stupid question from yesterday, sometimes it's bad 
idea to ask questions on mailing lists when you are very tired
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html