lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ff9dcc5-c34b-963f-f5e7-7038eecae98b@nvidia.com>
Date:   Mon, 26 Sep 2022 15:14:09 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     Lyude Paul <lyude@...hat.com>,
        Alistair Popple <apopple@...dia.com>, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Michael Ellerman <mpe@...erman.id.au>,
        Nicholas Piggin <npiggin@...il.com>,
        Felix Kuehling <Felix.Kuehling@....com>,
        Alex Deucher <alexander.deucher@....com>,
        Christian König <christian.koenig@....com>,
        "Pan, Xinhui" <Xinhui.Pan@....com>,
        David Airlie <airlied@...ux.ie>,
        Daniel Vetter <daniel@...ll.ch>,
        Ben Skeggs <bskeggs@...hat.com>,
        Karol Herbst <kherbst@...hat.com>,
        Ralph Campbell <rcampbell@...dia.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Alex Sierra <alex.sierra@....com>,
        linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
        amd-gfx@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
        dri-devel@...ts.freedesktop.org, Jason Gunthorpe <jgg@...dia.com>,
        Dan Williams <dan.j.williams@...el.com>
Subject: Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during
 release

On 9/26/22 14:35, Lyude Paul wrote:
>> +	for (i = 0; i < npages; i++) {
>> +		if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
>> +			struct page *dpage;
>> +
>> +			/*
>> +			 * _GFP_NOFAIL because the GPU is going away and there
>> +			 * is nothing sensible we can do if we can't copy the
>> +			 * data back.
>> +			 */
> 
> You'll have to excuse me for a moment since this area of nouveau isn't one of
> my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite
> retry, in the case of a GPU hotplug event I would assume we would rather just
> stop trying to migrate things to the GPU and just drop the data instead of
> hanging on infinite retries.
> 
Hi Lyude!

Actually, I really think it's better in this case to keep trying
(presumably not necessarily infinitely, but only until memory becomes
available), rather than failing out and corrupting data.

That's because I'm not sure it's completely clear that this memory is
discardable. And at some point, we're going to make this all work with
file-backed memory, which will *definitely* not be discardable--I
realize that we're not there yet, of course.

But here, it's reasonable to commit to just retrying indefinitely,
really. Memory should eventually show up. And if it doesn't, then
restarting the machine is better than corrupting data, generally.


thanks,

-- 
John Hubbard
NVIDIA

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ