[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210825123802.GD1721383@nvidia.com>
Date: Wed, 25 Aug 2021 09:38:02 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Christian König <christian.koenig@....com>
Cc: John Hubbard <jhubbard@...dia.com>,
Gal Pressman <galpress@...zon.com>,
Daniel Vetter <daniel@...ll.ch>,
Sumit Semwal <sumit.semwal@...aro.org>,
Doug Ledford <dledford@...hat.com>,
"open list:DMA BUFFER SHARING FRAMEWORK"
<linux-media@...r.kernel.org>,
dri-devel <dri-devel@...ts.freedesktop.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-rdma <linux-rdma@...r.kernel.org>,
Oded Gabbay <ogabbay@...ana.ai>,
Tomer Tayar <ttayar@...ana.ai>,
Yossi Leybovich <sleybo@...zon.com>,
Alexander Matushevsky <matua@...zon.com>,
Leon Romanovsky <leonro@...dia.com>,
Jianxin Xiong <jianxin.xiong@...el.com>
Subject: Re: [RFC] Make use of non-dynamic dmabuf in RDMA
On Wed, Aug 25, 2021 at 02:27:08PM +0200, Christian König wrote:
> Am 25.08.21 um 14:18 schrieb Jason Gunthorpe:
> > On Wed, Aug 25, 2021 at 08:17:51AM +0200, Christian König wrote:
> >
> > > The only real option where you could do P2P with buffer pinning are those
> > > compute boards where we know that everything is always accessible to
> > > everybody and we will never need to migrate anything. But even then you want
> > > some mechanism like cgroups to take care of limiting this. Otherwise any
> > > runaway process can bring down your whole system.
> > Why? It is not the pin that is the problem, it was allocating GPU
> > dedicated memory in the first place. pinning it just changes the
> > sequence to free it. No different than CPU memory.
>
> Pinning makes the memory un-evictable.
>
> In other words as long as we don't pin anything we can support as many
> processes as we want until we run out of swap space. Swapping sucks badly
> because your applications become pretty much unuseable, but you can easily
> recover from it by killing some process.
>
> With pinning on the other hand somebody sooner or later receives an -ENOMEM
> or -ENOSPC and there is no guarantee that this goes to the right process.
It is not really different - you have the same failure mode once the
system runs out of swap.
This is really the kernel side trying to push a policy to the user
side that the user side doesn't want..
Dedicated systems are a significant use case here and should be
supported, even if the same solution wouldn't be applicable to someone
running a desktop.
Jason
Powered by blists - more mailing lists