[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230515110315.uqifqgqkzcrrrubv@box.shutemov.name>
Date: Mon, 15 May 2023 14:03:15 +0300
From: "Kirill A . Shutemov" <kirill@...temov.name>
To: Lorenzo Stoakes <lstoakes@...il.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Jason Gunthorpe <jgg@...pe.ca>, Jens Axboe <axboe@...nel.dk>,
Matthew Wilcox <willy@...radead.org>,
Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>,
Leon Romanovsky <leon@...nel.org>,
Christian Benvenuti <benve@...co.com>,
Nelson Escobar <neescoba@...co.com>,
Bernard Metzler <bmt@...ich.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Bjorn Topel <bjorn@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Jonathan Lemon <jonathan.lemon@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Christian Brauner <brauner@...nel.org>,
Richard Cochran <richardcochran@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
linux-fsdevel@...r.kernel.org, linux-perf-users@...r.kernel.org,
netdev@...r.kernel.org, bpf@...r.kernel.org,
Oleg Nesterov <oleg@...hat.com>,
Jason Gunthorpe <jgg@...dia.com>,
John Hubbard <jhubbard@...dia.com>, Jan Kara <jack@...e.cz>,
Pavel Begunkov <asml.silence@...il.com>,
Mika Penttila <mpenttil@...hat.com>,
David Hildenbrand <david@...hat.com>,
Dave Chinner <david@...morbit.com>,
Theodore Ts'o <tytso@....edu>, Peter Xu <peterx@...hat.com>,
Matthew Rosato <mjrosato@...ux.ibm.com>,
"Paul E . McKenney" <paulmck@...nel.org>,
Christian Borntraeger <borntraeger@...ux.ibm.com>
Subject: Re: [PATCH v9 0/3] mm/gup: disallow GUP writing to file-backed
mappings by default
On Thu, May 04, 2023 at 10:27:50PM +0100, Lorenzo Stoakes wrote:
> Writing to file-backed mappings which require folio dirty tracking using
> GUP is a fundamentally broken operation, as kernel write access to GUP
> mappings do not adhere to the semantics expected by a file system.
>
> A GUP caller uses the direct mapping to access the folio, which does not
> cause write notify to trigger, nor does it enforce that the caller marks
> the folio dirty.
Okay, problem is clear and the patchset look good to me. But I'm worried
breaking existing users.
Do we expect the change to be visible to real world users? If yes, are we
okay to break them?
One thing that came to mind is KVM with "qemu -object memory-backend-file,share=on..."
It is mostly used for pmem emulation.
Do we have plan B?
Just a random/crazy/broken idea:
- Allow folio_mkclean() (and folio_clear_dirty_for_io()) to fail,
indicating that the page cannot be cleared because it is pinned;
- Introduce a new vm_operations_struct::mkclean() that would be called by
page_vma_mkclean_one() before clearing the range and can fail;
- On GUP, create an in-kernel fake VMA that represents the file, but with
custom vm_ops. The VMA registered in rmap to get notified on
folio_mkclean() and fail it because of GUP.
- folio_clear_dirty_for_io() callers will handle the new failure as
indication that the page can be written back but will stay dirty and
fs-specific data that is associated with the page writeback cannot be
freed.
I'm sure the idea is broken on many levels (I have never looked closely at
the writeback path). But maybe it is good enough as conversation started?
--
Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists