linux-kernel - Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LNX.2.00.1403042348050.30402@pobox.suse.cz>
Date:	Tue, 4 Mar 2014 23:48:43 +0100 (CET)
From:	Jiri Kosina <jkosina@...e.cz>
To:	Or Gerlitz <ogerlitz@...lanox.com>
cc:	Or Gerlitz <or.gerlitz@...il.com>,
	Roland Dreier <roland@...nel.org>,
	Amir Vadai <amirv@...lanox.com>,
	Eli Cohen <eli@....mellanox.co.il>,
	Eugenia Emantayev <eugenia@...lanox.com>,
	"David S. Miller" <davem@...emloft.net>,
	Mel Gorman <mgorman@...e.de>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when
 creating the QP

On Thu, 27 Feb 2014, Jiri Kosina wrote:

> On Thu, 27 Feb 2014, Or Gerlitz wrote:
> 
> > ipoib is coded over the verbs API (include/rdma/ib_verbs.h)  --- so tracking
> > the path from ipoib through the verbs api into mlx4 should be similar exercise
> > as doing so for mlx5, but let's 1st treat the higher level elements involved
> > with this patch.
> > 
> > Can you shed some light why the problem happens only for NFS, and not for
> > example with other IP/TCP storage protocols?
> >
> > For example, do you expect it to happen with iSCSI/TCP too? the Linux 
> > iSCSI initiator 1st open a TCP socket from user space to the target, 
> > next they do login exchange over this socket and later provide the 
> > socket to the kernel iscsi code to use as the back-end of a SCSI block 
> > device registered with the SCSI midlayer
> 
> Frankly, no idea. There was a problem with swapping over NFS, as writeback 
> was deadlocked with memory reclaim (memory needs to be allocated so that 
> swap could be accessed to reclaim memory). That's fixed by allocating the 
> buffers from PF_MEMALLOC reserve, introduced by Mel's and Peter's patchset 
> back in 3.9 or so. Oh, and the same has been done for swapping over NBD, 
> btw. Maybe iSCSI needs similar treatment, maybe it has it already, I 
> haven't checked. We haven't seen a bugreport for that though.
> 
> > > I don't think we have, and it indeed should be rather easy to add. The 
> > > more challenging part of the problem is where (and based on which 
> > > data) the flag would actually be set up on the netdevice so that it's 
> > > not horrible layering violation.
> > 
> > I assume that in the same manner netdevices advertize features to the 
> > networking core, the core can provide them operating directives after 
> > they register themselves.
> 
> Whatever suits you best. To sum it up:
> 
> - mlx4 is confirmed to have this problem, and we know how that problem 
>   happens -- see the paragraph in the changelog explaining the dependency 
>   between memory reclaim and allocation of TX ring
> 
> - we have a work around which requires human interaction in order 
>   to provide the information whether GFP_NOFS should be used or not
> 
> - I can very well understand why Mellanox would see that as a hack, but if 
>   more comprehensive fix is necessary, I'd expect those who understand 
>   the code the best to come up with a solution/proposal. I'd assume that 
>   you don't  want to keep the code with known and easily triggerable 
>   deadlock out there unfixed.
> 
> - where I see the potential for layering violation in any 'general' 
>   solution is that it's the filesystem that has to be "talking" to the 
>   underlying netdevice, i.e. you'll have to make filesystem 
>   netdevice-aware, right?

Mellanox folks, do you have any plan how to proceed here please?

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/