netdev - Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <612aa292-ec45-295c-b56c-c622876620fa@nvidia.com>
Date:   Sat, 28 Dec 2019 20:33:32 -0800
From:   John Hubbard <jhubbard@...dia.com>
To:     Leon Romanovsky <leon@...nel.org>
CC:     Jason Gunthorpe <jgg@...pe.ca>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Alex Williamson <alex.williamson@...hat.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Björn Töpel <bjorn.topel@...el.com>,
        Christoph Hellwig <hch@...radead.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Dave Chinner <david@...morbit.com>,
        David Airlie <airlied@...ux.ie>,
        "David S . Miller" <davem@...emloft.net>,
        Ira Weiny <ira.weiny@...el.com>, Jan Kara <jack@...e.cz>,
        Jens Axboe <axboe@...nel.dk>, Jonathan Corbet <corbet@....net>,
        Jérôme Glisse <jglisse@...hat.com>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Michal Hocko <mhocko@...e.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Paul Mackerras <paulus@...ba.org>,
        Shuah Khan <shuah@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>, <bpf@...r.kernel.org>,
        <dri-devel@...ts.freedesktop.org>, <kvm@...r.kernel.org>,
        <linux-block@...r.kernel.org>, <linux-doc@...r.kernel.org>,
        <linux-fsdevel@...r.kernel.org>, <linux-kselftest@...r.kernel.org>,
        <linux-media@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
        <linuxppc-dev@...ts.ozlabs.org>, <netdev@...r.kernel.org>,
        <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
        Maor Gottlieb <maorg@...lanox.com>,
        "Ran Rozenstein" <ranro@...lanox.com>
Subject: Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN

On 12/27/19 1:56 PM, John Hubbard wrote:
...
>> It is ancient verification test (~10y) which is not an easy task to
>> make it understandable and standalone :).
>>
> 
> Is this the only test that fails, btw? No other test failures or hints of
> problems?
> 
> (Also, maybe hopeless, but can *anyone* on the RDMA list provide some
> characterization of the test, such as how many pins per page, what page
> sizes are used? I'm still hoping to write a test to trigger something
> close to this...)
> 
> I do have a couple more ideas for test runs:
> 
> 1. Reduce GUP_PIN_COUNTING_BIAS to 1. That would turn the whole override of
> page->_refcount into a no-op, and so if all is well (it may not be!) with the
> rest of the patch, then we'd expect this problem to not reappear.
> 
> 2. Active /proc/vmstat *foll_pin* statistics unconditionally (just for these
> tests, of course), so we can see if there is a get/put mismatch. However, that
> will change the timing, and so it must be attempted independently of (1), in
> order to see if it ends up hiding the repro.
> 
> I've updated this branch to implement (1), but not (2), hoping you can give
> this one a spin?
> 
>     git@...hub.com:johnhubbard/linux.git  pin_user_pages_tracking_v11_with_diags
> 
> 

Also, looking ahead:

a) if the problem disappears with the latest above test, then we likely have
   a huge page refcount overflow, and there are a couple of different ways to
   fix it. 

b) if it still reproduces with the above, then it's some other random mistake,
   and in that case I'd be inclined to do a sort of guided (or classic, unguided)
   git bisect of the series. Because it could be any of several patches.

   If that's too much trouble, then I'd have to fall back to submitting a few
   patches at a time and working my way up to the tracking patch...


thanks,
-- 
John Hubbard
NVIDIA