[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200106090147.GA9176@quack2.suse.cz>
Date: Mon, 6 Jan 2020 10:01:47 +0100
From: Jan Kara <jack@...e.cz>
To: John Hubbard <jhubbard@...dia.com>
Cc: Leon Romanovsky <leon@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>,
Andrew Morton <akpm@...ux-foundation.org>,
Al Viro <viro@...iv.linux.org.uk>,
Alex Williamson <alex.williamson@...hat.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Björn Töpel <bjorn.topel@...el.com>,
Christoph Hellwig <hch@...radead.org>,
Dan Williams <dan.j.williams@...el.com>,
Daniel Vetter <daniel@...ll.ch>,
Dave Chinner <david@...morbit.com>,
David Airlie <airlied@...ux.ie>,
"David S . Miller" <davem@...emloft.net>,
Ira Weiny <ira.weiny@...el.com>, Jan Kara <jack@...e.cz>,
Jens Axboe <axboe@...nel.dk>, Jonathan Corbet <corbet@....net>,
Jérôme Glisse <jglisse@...hat.com>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Michal Hocko <mhocko@...e.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Paul Mackerras <paulus@...ba.org>,
Shuah Khan <shuah@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>, bpf@...r.kernel.org,
dri-devel@...ts.freedesktop.org, kvm@...r.kernel.org,
linux-block@...r.kernel.org, linux-doc@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-media@...r.kernel.org, linux-rdma@...r.kernel.org,
linuxppc-dev@...ts.ozlabs.org, netdev@...r.kernel.org,
linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
Maor Gottlieb <maorg@...lanox.com>,
Ran Rozenstein <ranro@...lanox.com>
Subject: Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN
On Sat 28-12-19 20:33:32, John Hubbard wrote:
> On 12/27/19 1:56 PM, John Hubbard wrote:
> ...
> >> It is ancient verification test (~10y) which is not an easy task to
> >> make it understandable and standalone :).
> >>
> >
> > Is this the only test that fails, btw? No other test failures or hints of
> > problems?
> >
> > (Also, maybe hopeless, but can *anyone* on the RDMA list provide some
> > characterization of the test, such as how many pins per page, what page
> > sizes are used? I'm still hoping to write a test to trigger something
> > close to this...)
> >
> > I do have a couple more ideas for test runs:
> >
> > 1. Reduce GUP_PIN_COUNTING_BIAS to 1. That would turn the whole override of
> > page->_refcount into a no-op, and so if all is well (it may not be!) with the
> > rest of the patch, then we'd expect this problem to not reappear.
> >
> > 2. Active /proc/vmstat *foll_pin* statistics unconditionally (just for these
> > tests, of course), so we can see if there is a get/put mismatch. However, that
> > will change the timing, and so it must be attempted independently of (1), in
> > order to see if it ends up hiding the repro.
> >
> > I've updated this branch to implement (1), but not (2), hoping you can give
> > this one a spin?
> >
> > git@...hub.com:johnhubbard/linux.git pin_user_pages_tracking_v11_with_diags
> >
> >
>
> Also, looking ahead:
>
> a) if the problem disappears with the latest above test, then we likely have
> a huge page refcount overflow, and there are a couple of different ways to
> fix it.
>
> b) if it still reproduces with the above, then it's some other random mistake,
> and in that case I'd be inclined to do a sort of guided (or classic, unguided)
> git bisect of the series. Because it could be any of several patches.
>
> If that's too much trouble, then I'd have to fall back to submitting a few
> patches at a time and working my way up to the tracking patch...
It could also be that an ordinary page reference is dropped with 'unpin'
thus underflowing the page refcount...
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists