[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aQVAVBGM7inQUa7z@fedora>
Date: Sat, 1 Nov 2025 07:03:48 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Caleb Sander Mateos <csander@...estorage.com>
Cc: Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ublk: use copy_{to,from}_iter() for user copy
On Fri, Oct 31, 2025 at 09:02:48AM -0700, Caleb Sander Mateos wrote:
> On Thu, Oct 30, 2025 at 8:45 PM Ming Lei <ming.lei@...hat.com> wrote:
> >
> > On Thu, Oct 30, 2025 at 07:05:21PM -0600, Caleb Sander Mateos wrote:
> > > ublk_copy_user_pages()/ublk_copy_io_pages() currently uses
> > > iov_iter_get_pages2() to extract the pages from the iov_iter and
> > > memcpy()s between the bvec_iter and the iov_iter's pages one at a time.
> > > Switch to using copy_to_iter()/copy_from_iter() instead. This avoids the
> > > user page reference count increments and decrements and needing to split
> > > the memcpy() at user page boundaries. It also simplifies the code
> > > considerably.
> > >
> > > Signed-off-by: Caleb Sander Mateos <csander@...estorage.com>
> > > ---
> > > drivers/block/ublk_drv.c | 62 +++++++++-------------------------------
> > > 1 file changed, 14 insertions(+), 48 deletions(-)
> > >
> > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > index 0c74a41a6753..852350e639d6 100644
> > > --- a/drivers/block/ublk_drv.c
> > > +++ b/drivers/block/ublk_drv.c
> > > @@ -912,58 +912,47 @@ static const struct block_device_operations ub_fops = {
> > > .open = ublk_open,
> > > .free_disk = ublk_free_disk,
> > > .report_zones = ublk_report_zones,
> > > };
> > >
> > > -#define UBLK_MAX_PIN_PAGES 32
> > > -
> > > struct ublk_io_iter {
> > > - struct page *pages[UBLK_MAX_PIN_PAGES];
> > > struct bio *bio;
> > > struct bvec_iter iter;
> > > };
> >
> > ->pages[] is actually for pinning user io pages in batch, so killing it may cause
> > perf drop.
>
> As far as I can tell, copy_to_iter()/copy_from_iter() avoids the page
> pinning entirely. It calls copy_to_user_iter() for each contiguous
> user address range:
>
> size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> {
> if (WARN_ON_ONCE(i->data_source))
> return 0;
> if (user_backed_iter(i))
> might_fault();
> return iterate_and_advance(i, bytes, (void *)addr,
> copy_to_user_iter, memcpy_to_iter);
> }
>
> Which just checks that the address range doesn't include any kernel
> addresses and then memcpy()s directly via the userspace virtual
> addresses:
>
> static __always_inline
> size_t copy_to_user_iter(void __user *iter_to, size_t progress,
> size_t len, void *from, void *priv2)
> {
> if (should_fail_usercopy())
> return len;
> if (access_ok(iter_to, len)) {
> from += progress;
> instrument_copy_to_user(iter_to, from, len);
> len = raw_copy_to_user(iter_to, from, len);
> }
> return len;
> }
>
> static __always_inline __must_check unsigned long
> raw_copy_to_user(void __user *dst, const void *src, unsigned long size)
> {
> return copy_user_generic((__force void *)dst, src, size);
> }
>
> static __always_inline __must_check unsigned long
> copy_user_generic(void *to, const void *from, unsigned long len)
> {
> stac();
> /*
> * If CPU has FSRM feature, use 'rep movs'.
> * Otherwise, use rep_movs_alternative.
> */
> asm volatile(
> "1:\n\t"
> ALTERNATIVE("rep movsb",
> "call rep_movs_alternative",
> ALT_NOT(X86_FEATURE_FSRM))
> "2:\n"
> _ASM_EXTABLE_UA(1b, 2b)
> :"+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
> : : "memory", "rax");
> clac();
> return len;
> }
>
> Am I missing something?
page is allocated & mapped in page fault handler.
However, in typical cases, pages in io buffer shouldn't be swapped out
frequently, so this cleanup may be good, I will run some perf test.
Also copy_page_from_iter()/copy_page_to_iter() can be used for avoiding
bvec_kmap_local(), and the two helper can handle one whole bvec instead
of single page.
Then rq_for_each_bvec() can be used directly, and `ublk_io_iter` may be
killed.
Thanks,
Ming
Powered by blists - more mailing lists