[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5c4160cc-6aec-f6a6-8bab-b0bf201a037c@redhat.com>
Date: Wed, 24 May 2023 09:06:36 +0200
From: David Hildenbrand <david@...hat.com>
To: David Howells <dhowells@...hat.com>,
Christoph Hellwig <hch@...radead.org>
Cc: Jens Axboe <axboe@...nel.dk>, Al Viro <viro@...iv.linux.org.uk>,
Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
Jeff Layton <jlayton@...nel.org>,
Jason Gunthorpe <jgg@...dia.com>,
Logan Gunthorpe <logang@...tatee.com>,
Hillf Danton <hdanton@...a.com>,
Christian Brauner <brauner@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-fsdevel@...r.kernel.org, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: Extending page pinning into fs/direct-io.c
On 23.05.23 22:16, David Howells wrote:
> Christoph Hellwig <hch@...radead.org> wrote:
>
>> But can you please also take care of the legacy direct I/O code? I'd really
>> hate to leave yet another unfinished transition around.
>
> I've been poking at it this afternoon, but it doesn't look like it's going to
> be straightforward, unfortunately. The mm folks have been withdrawing access
> to the pinning API behind the ramparts of the mm/ dir. Further, the dio code
> will (I think), under some circumstances, arbitrarily insert the zero_page
> into a list of things that are maybe pinned or maybe unpinned, but I can (I
> think) also be given a pinned zero_page from the GUP code if the page tables
> point to one and a DIO-write is requested - so just doing if page == zero_page
> isn't sufficient.
>
> What I'd like to do is to make the GUP code not take a ref on the zero_page
> if, say, FOLL_DONT_PIN_ZEROPAGE is passed in, and then make the bio cleanup
> code always ignore the zero_page.
We discussed doing that unconditionally in the context of vfio (below), but vfio
decided to add a workaround suitable for stable.
In case of FOLL_PIN it's simple: if we detect the zeropage, don't mess with the
refcount when pinning and don't mess with the refcount when unpinning (esp.
unpin_user_pages). FOLL_GET is a different story but we don't have to mess with
that.
So there shouldn't be need for a FOLL_DONT_PIN_ZEROPAGE, we could just do it
unconditionally.
>
> Alternatively, I can drop the pin immediately if I get given one on the
> zero_page - it's not going anywhere, after all.
That's what vfio did in
commit 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4
Author: Alex Williamson <alex.williamson@...hat.com>
Date: Mon Aug 29 21:05:40 2022 -0600
vfio/type1: Unpin zero pages
There's currently a reference count leak on the zero page. We increment
the reference via pin_user_pages_remote(), but the page is later handled
as an invalid/reserved page, therefore it's not accounted against the
user and not unpinned by our put_pfn().
Introducing special zero page handling in put_pfn() would resolve the
leak, but without accounting of the zero page, a single user could
still create enough mappings to generate a reference count overflow.
The zero page is always resident, so for our purposes there's no reason
to keep it pinned. Therefore, add a loop to walk pages returned from
pin_user_pages_remote() and unpin any zero pages.
For vfio that handling no longer required, because FOLL_LONGTERM will never pin
the shared zeropage.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists