[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.00.1403042348050.30402@pobox.suse.cz>
Date: Tue, 4 Mar 2014 23:48:43 +0100 (CET)
From: Jiri Kosina <jkosina@...e.cz>
To: Or Gerlitz <ogerlitz@...lanox.com>
cc: Or Gerlitz <or.gerlitz@...il.com>,
Roland Dreier <roland@...nel.org>,
Amir Vadai <amirv@...lanox.com>,
Eli Cohen <eli@....mellanox.co.il>,
Eugenia Emantayev <eugenia@...lanox.com>,
"David S. Miller" <davem@...emloft.net>,
Mel Gorman <mgorman@...e.de>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when
creating the QP
On Thu, 27 Feb 2014, Jiri Kosina wrote:
> On Thu, 27 Feb 2014, Or Gerlitz wrote:
>
> > ipoib is coded over the verbs API (include/rdma/ib_verbs.h) --- so tracking
> > the path from ipoib through the verbs api into mlx4 should be similar exercise
> > as doing so for mlx5, but let's 1st treat the higher level elements involved
> > with this patch.
> >
> > Can you shed some light why the problem happens only for NFS, and not for
> > example with other IP/TCP storage protocols?
> >
> > For example, do you expect it to happen with iSCSI/TCP too? the Linux
> > iSCSI initiator 1st open a TCP socket from user space to the target,
> > next they do login exchange over this socket and later provide the
> > socket to the kernel iscsi code to use as the back-end of a SCSI block
> > device registered with the SCSI midlayer
>
> Frankly, no idea. There was a problem with swapping over NFS, as writeback
> was deadlocked with memory reclaim (memory needs to be allocated so that
> swap could be accessed to reclaim memory). That's fixed by allocating the
> buffers from PF_MEMALLOC reserve, introduced by Mel's and Peter's patchset
> back in 3.9 or so. Oh, and the same has been done for swapping over NBD,
> btw. Maybe iSCSI needs similar treatment, maybe it has it already, I
> haven't checked. We haven't seen a bugreport for that though.
>
> > > I don't think we have, and it indeed should be rather easy to add. The
> > > more challenging part of the problem is where (and based on which
> > > data) the flag would actually be set up on the netdevice so that it's
> > > not horrible layering violation.
> >
> > I assume that in the same manner netdevices advertize features to the
> > networking core, the core can provide them operating directives after
> > they register themselves.
>
> Whatever suits you best. To sum it up:
>
> - mlx4 is confirmed to have this problem, and we know how that problem
> happens -- see the paragraph in the changelog explaining the dependency
> between memory reclaim and allocation of TX ring
>
> - we have a work around which requires human interaction in order
> to provide the information whether GFP_NOFS should be used or not
>
> - I can very well understand why Mellanox would see that as a hack, but if
> more comprehensive fix is necessary, I'd expect those who understand
> the code the best to come up with a solution/proposal. I'd assume that
> you don't want to keep the code with known and easily triggerable
> deadlock out there unfixed.
>
> - where I see the potential for layering violation in any 'general'
> solution is that it's the filesystem that has to be "talking" to the
> underlying netdevice, i.e. you'll have to make filesystem
> netdevice-aware, right?
Mellanox folks, do you have any plan how to proceed here please?
Thanks,
--
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists