[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YyvG+Oih2A37Grcf@ZenIV>
Date: Thu, 22 Sep 2022 03:22:48 +0100
From: Al Viro <viro@...iv.linux.org.uk>
To: Jan Kara <jack@...e.cz>
Cc: Christoph Hellwig <hch@...radead.org>,
John Hubbard <jhubbard@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Jens Axboe <axboe@...nel.dk>,
Miklos Szeredi <miklos@...redi.hu>,
"Darrick J . Wong" <djwong@...nel.org>,
Trond Myklebust <trond.myklebust@...merspace.com>,
Anna Schumaker <anna@...nel.org>,
David Hildenbrand <david@...hat.com>,
Logan Gunthorpe <logang@...tatee.com>,
linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-xfs@...r.kernel.org, linux-nfs@...r.kernel.org,
linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines
On Thu, Sep 15, 2022 at 10:16:25AM +0200, Jan Kara wrote:
> > How would that work? What protects the area where you want to avoid running
> > into pinned pages from previously acceptable page getting pinned? If "they
> > must have been successfully unmapped" is a part of what you are planning, we
> > really do have a problem...
>
> But this is a very good question. So far the idea was that we lock the
> page, unmap (or writeprotect) the page, and then check pincount == 0 and
> that is a reliable method for making sure page data is stable (until we
> unlock the page & release other locks blocking page faults and writes). But
> once suddently ordinary page references can be used to create pins this
> does not work anymore. Hrm.
>
> Just brainstorming ideas now: So we'd either need to obtain the pins early
> when we still have the virtual address (but I guess that is often not
> practical but should work e.g. for normal direct IO path) or we need some
> way to "simulate" the page fault when pinning the page, just don't map it
> into page tables in the end. This simulated page fault could be perhaps
> avoided if rmap walk shows that the page is already mapped somewhere with
> suitable permissions.
OK. As far as I can see, the rules are along the lines of
* creator of ITER_BVEC/ITER_XARRAY is responsible for pages being safe.
That includes
* page known to be locked by caller
* page being privately allocated and not visible to anyone else
* iterator being data source
* page coming from pin_user_pages(), possibly as the result of
iov_iter_pin_pages() on ITER_IOVEC/ITER_UBUF.
* ITER_PIPE pages are always safe
* pages found in ITER_BVEC/ITER_XARRAY are safe, since the iterator
had been created with such.
My preference would be to have iov_iter_get_pages() and friends pin if and
only if we have data-destination iov_iter that is user-backed. For
data-source user-backed we only need FOLL_GET, and for all other flavours
(ITER_BVEC, etc.) we only do get_page(), if we need to grab any references
at all.
What I'd like to have is the understanding of the places where we drop
the references acquired by iov_iter_get_pages(). How do we decide
whether to unpin? E.g. pipe_buffer carries a reference to page and no
way to tell whether it's a pinned one; results of iov_iter_get_pages()
on ITER_IOVEC *can* end up there, but thankfully only from data-source
(== WRITE, aka. ITER_SOURCE) iov_iter. So for those we don't care.
Then there's nfs_request; AFAICS, we do need to pin the references in
those if they are coming from nfs_direct_read_schedule_iovec(), but
not if they come from readpage_async_filler(). How do we deal with
coalescence, etc.? It's been a long time since I really looked at
that code... Christoph, could you give any comments on that one?
Note, BTW, that nfs_request coming from readpage_async_filler() have
pages locked by caller; the ones from nfs_direct_read_schedule_iovec()
do not, and that's where we want them pinned. Resulting page references
end up (after quite a trip through data structures) stuffed into struct
rpc_rqst ->rc_recv_buf.pages[] and when a response arrives from server,
they get picked by xs_read_bvec() and fed to iov_iter_bvec(). In one
case it's safe since the pages are locked; in another - since they would
come from pin_user_pages(). The call chain at the time they are used
has nothing to do with the originator - sunrpc is looking at the arrived
response to READ that matches an rpc_rqst that had been created by sender
of that request and safety is the sender's responsibility.
Powered by blists - more mailing lists