lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3bced3a4-b826-46ab-3d98-d2dc6871bfe1@nvidia.com>
Date:   Mon, 3 May 2021 11:17:31 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     Logan Gunthorpe <logang@...tatee.com>,
        <linux-kernel@...r.kernel.org>, <linux-nvme@...ts.infradead.org>,
        <linux-block@...r.kernel.org>, <linux-pci@...r.kernel.org>,
        <linux-mm@...ck.org>, <iommu@...ts.linux-foundation.org>
CC:     Stephen Bates <sbates@...thlin.com>,
        Christoph Hellwig <hch@....de>,
        Dan Williams <dan.j.williams@...el.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Christian König <christian.koenig@....com>,
        Don Dutile <ddutile@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Jakowski Andrzej <andrzej.jakowski@...el.com>,
        Minturn Dave B <dave.b.minturn@...el.com>,
        Jason Ekstrand <jason@...kstrand.net>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Xiong Jianxin <jianxin.xiong@...el.com>,
        Bjorn Helgaas <helgaas@...nel.org>,
        Ira Weiny <ira.weiny@...el.com>,
        Robin Murphy <robin.murphy@....com>,
        Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to
 upstream_bridge_distance_warn()

On 5/3/21 8:57 AM, Logan Gunthorpe wrote:
> 
> 
> On 2021-05-01 9:58 p.m., John Hubbard wrote:
>> Another odd thing: this used to check for memory failure and just give
>> up, and now it doesn't. Yes, I realize that it all still works at the
>> moment, but this is quirky and we shouldn't stop here.
>>
>> Instead, a cleaner approach would be to push the memory allocation
>> slightly higher up the call stack, out to the
>> pci_p2pdma_distance_many(). So pci_p2pdma_distance_many() should make
>> the kmalloc() call, and fail out if it can't get a page for the seq_buf
>> buffer. Then you don't have to do all this odd stuff.
> 
> I don't really agree with this assessment. If kmalloc fails to
> initialize the seq_buf() (which should be very rare), the only thing
> that is lost is the one warning print that tells the user the command
> line parameter needed disable the ACS. Everything else works fine,
> nothing else can fail. I don't see the need to add extra complexity just
> so the code errors out in no-mem instead of just skipping the one,
> slightly more informative, warning line.

That's the thing: memory failure should be exceedingly rare for this.
Therefore, just fail out entirely (which I don't expect we'll likely
ever see), instead of doing all this weird stuff to try to continue
on if you cannot allocate a single page. If you are in that case, the
system is not in a state that is going to run your dma p2p setup well
anyway.

I think it's *less* complexity to allocate up front, fail early if
allocation fails, and then not have to deal with these really odd
quirks at the lower levels.

> 
> Also, keep in mind the result of all these functions are cached so it
> only ever happens once. So for this to matter, the user would have to do
> their first transaction between two devices exactly at the time memory
> allocations would fail.
> 
> 
>> Furthermore, the call sites can then decide for themselves which GFP
>> flags, GFP_ATOMIC or GFP_KERNEL or whatever they want for kmalloc().
>>
>> A related thing: this whole exercise would go better if there were a
>> preparatory patch or two that changed the return codes in this file to
>> something less crazy. There are too many functions that can fail, but
>> are treated as if they sort-of-mostly-would-never-fail, in the hopes of
>> using the return value directly for counting and such. This is badly
>> mistaken, and it leads developers to try to avoid returning -ENOMEM
>> (which is what we need here).
> 
> Hmm? Which functions can fail? and how?
> 

Let's defer that to the other patches, I was sort of looking ahead to
those, sorry.

thanks,
-- 
John Hubbard
NVIDIA

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ