[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3bced3a4-b826-46ab-3d98-d2dc6871bfe1@nvidia.com>
Date: Mon, 3 May 2021 11:17:31 -0700
From: John Hubbard <jhubbard@...dia.com>
To: Logan Gunthorpe <logang@...tatee.com>,
<linux-kernel@...r.kernel.org>, <linux-nvme@...ts.infradead.org>,
<linux-block@...r.kernel.org>, <linux-pci@...r.kernel.org>,
<linux-mm@...ck.org>, <iommu@...ts.linux-foundation.org>
CC: Stephen Bates <sbates@...thlin.com>,
Christoph Hellwig <hch@....de>,
Dan Williams <dan.j.williams@...el.com>,
Jason Gunthorpe <jgg@...pe.ca>,
Christian König <christian.koenig@....com>,
Don Dutile <ddutile@...hat.com>,
Matthew Wilcox <willy@...radead.org>,
Daniel Vetter <daniel.vetter@...ll.ch>,
Jakowski Andrzej <andrzej.jakowski@...el.com>,
Minturn Dave B <dave.b.minturn@...el.com>,
Jason Ekstrand <jason@...kstrand.net>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Xiong Jianxin <jianxin.xiong@...el.com>,
Bjorn Helgaas <helgaas@...nel.org>,
Ira Weiny <ira.weiny@...el.com>,
Robin Murphy <robin.murphy@....com>,
Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to
upstream_bridge_distance_warn()
On 5/3/21 8:57 AM, Logan Gunthorpe wrote:
>
>
> On 2021-05-01 9:58 p.m., John Hubbard wrote:
>> Another odd thing: this used to check for memory failure and just give
>> up, and now it doesn't. Yes, I realize that it all still works at the
>> moment, but this is quirky and we shouldn't stop here.
>>
>> Instead, a cleaner approach would be to push the memory allocation
>> slightly higher up the call stack, out to the
>> pci_p2pdma_distance_many(). So pci_p2pdma_distance_many() should make
>> the kmalloc() call, and fail out if it can't get a page for the seq_buf
>> buffer. Then you don't have to do all this odd stuff.
>
> I don't really agree with this assessment. If kmalloc fails to
> initialize the seq_buf() (which should be very rare), the only thing
> that is lost is the one warning print that tells the user the command
> line parameter needed disable the ACS. Everything else works fine,
> nothing else can fail. I don't see the need to add extra complexity just
> so the code errors out in no-mem instead of just skipping the one,
> slightly more informative, warning line.
That's the thing: memory failure should be exceedingly rare for this.
Therefore, just fail out entirely (which I don't expect we'll likely
ever see), instead of doing all this weird stuff to try to continue
on if you cannot allocate a single page. If you are in that case, the
system is not in a state that is going to run your dma p2p setup well
anyway.
I think it's *less* complexity to allocate up front, fail early if
allocation fails, and then not have to deal with these really odd
quirks at the lower levels.
>
> Also, keep in mind the result of all these functions are cached so it
> only ever happens once. So for this to matter, the user would have to do
> their first transaction between two devices exactly at the time memory
> allocations would fail.
>
>
>> Furthermore, the call sites can then decide for themselves which GFP
>> flags, GFP_ATOMIC or GFP_KERNEL or whatever they want for kmalloc().
>>
>> A related thing: this whole exercise would go better if there were a
>> preparatory patch or two that changed the return codes in this file to
>> something less crazy. There are too many functions that can fail, but
>> are treated as if they sort-of-mostly-would-never-fail, in the hopes of
>> using the return value directly for counting and such. This is badly
>> mistaken, and it leads developers to try to avoid returning -ENOMEM
>> (which is what we need here).
>
> Hmm? Which functions can fail? and how?
>
Let's defer that to the other patches, I was sort of looking ahead to
those, sorry.
thanks,
--
John Hubbard
NVIDIA
Powered by blists - more mailing lists