[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20191107145748.GA3666@redhat.com>
Date: Thu, 7 Nov 2019 09:57:48 -0500
From: Jerome Glisse <jglisse@...hat.com>
To: Hillf Danton <hdanton@...a.com>
Cc: John Hubbard <jhubbard@...dia.com>, linux-mm <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Vlastimil Babka <vbabka@...e.cz>, Mel Gorman <mgorman@...e.de>,
Dan Williams <dan.j.williams@...el.com>,
Ira Weiny <ira.weiny@...el.com>,
Christoph Hellwig <hch@....de>,
Jonathan Corbet <corbet@....net>
Subject: Re: [RFC] mm: gup: add helper page_try_gup_pin(page)
On Thu, Nov 07, 2019 at 05:50:17PM +0800, Hillf Danton wrote:
>
> On Wed, 6 Nov 2019 10:46:29 -0500 Jerome Glisse wrote:
> >
> > On Wed, Nov 06, 2019 at 05:22:40PM +0800, Hillf Danton wrote:
> [...]
> > > >
> > > > Once driver has GUP it does not check and re-check the struct page
> > > > so there is no synchronization whatsoever after GUP happened. In
> > > > fact for some driver you can not synchronize anything once the device
> > > > has been program. Many devices are not just simple DMA engine you
> > > > can start and stop at will (network, GPUs, ...).
> > >
> > > Because "there is no synchronization whatsoever after GUP happened,"
> > > we need to take another close look at the reasoning for tracking
> > > multiple gupers if the chance of their mutual data corruptions exists
> > > in the wild. (If any sync mechanism sits between them to avoid data
> > > corruption, then it seems single pin is enough.)
> >
> > It does exist in the wild but the userspace application would be either
> > doing something stupid or something terribly clever. For instance you
> > can have 2 network interface writing to the same GUPed page but that is
> > because the application made the same request over two NICs and both
> > endup writting the samething.
> >
> > You can also have 2 GUPer each writting to different part of the page
> > and never stepping on each others.
> >
> > The point really is that from kernel point of view there is just no
> > way to know if the application is doing something wrong or if it just
> > perfectly fine. This is exactly the same thing as CPU threads, you do
> > not ask the kernel to ascertain wether what application threads are
> > doing is wrong or right.
> >
> > So we have to live with the fact that we can have multiple GUPers and
> > that it is not our problems if that happens and we can do nothing
> > about it.
>
> Ok. Multiple gupers are a must-have, and perhaps their mutal data
> corruptions as well.
>
> > Note that we are removing GUP from some of those driver, ones where
> > the device can abide to mmu notifier. But that is just something
> > orthogonal to all this.
> >
> >
> > > > So once a page is GUP there is just noway to garanty its stability
> > > > hence the best thing we can do is snapshot it to a bounce page.
> > >
> > > It becomes clearer OTOH that we are more likely than not moving in
> > > the incorrect direction, in cases like how to detect gupers and what
> > > to do for writeback if page is gup pinned, without a clear picture
> > > of the bounce page in the first place. Any plan to post a patch just
> > > for idea show?
> >
> > The agreement so far is that we need to be able to identify GUPed
> > pages and this is what John's patchset does. Once we have that piece
>
> Oh they are there, and barely trot away in case of long-lived pin.
>
> > than we can discuss what to do in respect of write-back. Which is
>
> Nobody seems to care what to do in the absence of gup pin.
I am not sure i follow ? Today we can not differentiate between GUP
and regular get_page(), if you use some combination of specific fs
and hardware you might get some BUG_ON() throws at you depending on
how lucky/unlucky you are. We can not solve this without being able
to differentiate between GUP and regular get_page(). Hence why John's
patchset is the first step in the right direction.
If there is no GUP on a page then regular writeback happens as it has
for years now so in absence of GUP i do not see any issue.
> > still something where there is no agreement as far as i remember the
> > outcome of the last discussion we had. I expect this will a topic
> > at next LSF/MM or maybe something we can flush out before.
>
> These are the restraints we know
>
> A, multiple gup pins
> B, mutual data corruptions
> C, no break of existing use cases
> D, zero copy
? What you mean by zero copy ?
> E, feel free to add
>
> then what is preventing an agreement like bounce page?
There is 2 sides (AFAIR):
- do not write back GUPed page and wait until GUP goes away to
write them. But GUP can last as long as the uptime and we can
loose data on power failure.
- use a bounce page so that there is a chance we have some data
on power failure
>
> Because page migrate and reclaim have been working for a while with
> gup pin taken into account, detecting it has no priority in any form
> over the agreement on how to make a witeback page stable.
migrate just ignore GUPed page and thus there is no issue with migrate.
writeback is a special case here because some filesystem need a stable
page content and also we need to inhibit some fs specific things that
trigger BUG_ON() in set_page_dirty*()
> What seems more important, restriction B above makes C hard to meet
> in any feasible approach trying to keep a writeback page stable, and
> zero-copy makes it harder AFAICS.
writeback can use bounce page, zero copy ie not having to use bounce
page, is not an issue in fact in some cases we already use bounce page
(at the block device level).
>
> > In any case my opinion is bounce page is the best thing we can do,
> > from application and FS point of view it mimics the characteristics
> > of regular write-back just as if the write protection window of the
> > write-backed page was infinitly short.
>
> A 100-line patch tells more than a 200-line explanation can and helps
> to shorten the discussion prior to reaching an agreement.
It is not that trivial, you need to make sure every layer from fs down
to block device driver properly behave in front of bounce page. We have
such mechanism for bio but it is a the bio level but maybe it can be
dumped one level.
Cheers,
Jérôme
Powered by blists - more mailing lists