lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87cdfaa6-68a4-4c99-8959-7610a879facc@kernel.dk>
Date: Wed, 5 Nov 2025 08:41:20 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Caleb Sander Mateos <csander@...estorage.com>
Cc: Ming Lei <ming.lei@...hat.com>, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ublk: use copy_{to,from}_iter() for user copy

On 11/5/25 8:37 AM, Caleb Sander Mateos wrote:
> On Wed, Nov 5, 2025 at 7:26?AM Jens Axboe <axboe@...nel.dk> wrote:
>>
>> On 11/4/25 6:48 PM, Ming Lei wrote:
>>> On Mon, Nov 03, 2025 at 08:40:30AM -0800, Caleb Sander Mateos wrote:
>>>> On Fri, Oct 31, 2025 at 4:04?PM Ming Lei <ming.lei@...hat.com> wrote:
>>>>>
>>>>> On Fri, Oct 31, 2025 at 09:02:48AM -0700, Caleb Sander Mateos wrote:
>>>>>> On Thu, Oct 30, 2025 at 8:45?PM Ming Lei <ming.lei@...hat.com> wrote:
>>>>>>>
>>>>>>> On Thu, Oct 30, 2025 at 07:05:21PM -0600, Caleb Sander Mateos wrote:
>>>>>>>> ublk_copy_user_pages()/ublk_copy_io_pages() currently uses
>>>>>>>> iov_iter_get_pages2() to extract the pages from the iov_iter and
>>>>>>>> memcpy()s between the bvec_iter and the iov_iter's pages one at a time.
>>>>>>>> Switch to using copy_to_iter()/copy_from_iter() instead. This avoids the
>>>>>>>> user page reference count increments and decrements and needing to split
>>>>>>>> the memcpy() at user page boundaries. It also simplifies the code
>>>>>>>> considerably.
>>>>>>>>
>>>>>>>> Signed-off-by: Caleb Sander Mateos <csander@...estorage.com>
>>>>>>>> ---
>>>>>>>>  drivers/block/ublk_drv.c | 62 +++++++++-------------------------------
>>>>>>>>  1 file changed, 14 insertions(+), 48 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>>>>>>>> index 0c74a41a6753..852350e639d6 100644
>>>>>>>> --- a/drivers/block/ublk_drv.c
>>>>>>>> +++ b/drivers/block/ublk_drv.c
>>>>>>>> @@ -912,58 +912,47 @@ static const struct block_device_operations ub_fops = {
>>>>>>>>       .open =         ublk_open,
>>>>>>>>       .free_disk =    ublk_free_disk,
>>>>>>>>       .report_zones = ublk_report_zones,
>>>>>>>>  };
>>>>>>>>
>>>>>>>> -#define UBLK_MAX_PIN_PAGES   32
>>>>>>>> -
>>>>>>>>  struct ublk_io_iter {
>>>>>>>> -     struct page *pages[UBLK_MAX_PIN_PAGES];
>>>>>>>>       struct bio *bio;
>>>>>>>>       struct bvec_iter iter;
>>>>>>>>  };
>>>>>>>
>>>>>>> ->pages[] is actually for pinning user io pages in batch, so killing it may cause
>>>>>>> perf drop.
>>>>>>
>>>>>> As far as I can tell, copy_to_iter()/copy_from_iter() avoids the page
>>>>>> pinning entirely. It calls copy_to_user_iter() for each contiguous
>>>>>> user address range:
>>>>>>
>>>>>> size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
>>>>>> {
>>>>>>         if (WARN_ON_ONCE(i->data_source))
>>>>>>                 return 0;
>>>>>>         if (user_backed_iter(i))
>>>>>>                 might_fault();
>>>>>>         return iterate_and_advance(i, bytes, (void *)addr,
>>>>>>                                    copy_to_user_iter, memcpy_to_iter);
>>>>>> }
>>>>>>
>>>>>> Which just checks that the address range doesn't include any kernel
>>>>>> addresses and then memcpy()s directly via the userspace virtual
>>>>>> addresses:
>>>>>>
>>>>>> static __always_inline
>>>>>> size_t copy_to_user_iter(void __user *iter_to, size_t progress,
>>>>>>                          size_t len, void *from, void *priv2)
>>>>>> {
>>>>>>         if (should_fail_usercopy())
>>>>>>                 return len;
>>>>>>         if (access_ok(iter_to, len)) {
>>>>>>                 from += progress;
>>>>>>                 instrument_copy_to_user(iter_to, from, len);
>>>>>>                 len = raw_copy_to_user(iter_to, from, len);
>>>>>>         }
>>>>>>         return len;
>>>>>> }
>>>>>>
>>>>>> static __always_inline __must_check unsigned long
>>>>>> raw_copy_to_user(void __user *dst, const void *src, unsigned long size)
>>>>>> {
>>>>>>         return copy_user_generic((__force void *)dst, src, size);
>>>>>> }
>>>>>>
>>>>>> static __always_inline __must_check unsigned long
>>>>>> copy_user_generic(void *to, const void *from, unsigned long len)
>>>>>> {
>>>>>>         stac();
>>>>>>         /*
>>>>>>          * If CPU has FSRM feature, use 'rep movs'.
>>>>>>          * Otherwise, use rep_movs_alternative.
>>>>>>          */
>>>>>>         asm volatile(
>>>>>>                 "1:\n\t"
>>>>>>                 ALTERNATIVE("rep movsb",
>>>>>>                             "call rep_movs_alternative",
>>>>>> ALT_NOT(X86_FEATURE_FSRM))
>>>>>>                 "2:\n"
>>>>>>                 _ASM_EXTABLE_UA(1b, 2b)
>>>>>>                 :"+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
>>>>>>                 : : "memory", "rax");
>>>>>>         clac();
>>>>>>         return len;
>>>>>> }
>>>>>>
>>>>>> Am I missing something?
>>>>>
>>>>> page is allocated & mapped in page fault handler.
>>>>
>>>> Right, physical pages certainly need to be allocated for the virtual
>>>> address range being copied to/from. But that would have happened
>>>> previously in iov_iter_get_pages2(), so this isn't a new cost. And as
>>>> you point out, in the common case that the virtual pages are already
>>>> mapped to physical pages, the copy won't cause any page faults.
>>>>
>>>>>
>>>>> However, in typical cases, pages in io buffer shouldn't be swapped out
>>>>> frequently, so this cleanup may be good, I will run some perf test.
>>>>
>>>> Thanks for testing.
>>>
>>> `fio/t/io_uring` shows 40% improvement on `./kublk -t null -q 2` with this
>>> patch in my test VM, so looks very nice improvement.
>>>
>>> Also it works well by forcing to pass IOSQE_ASYNC on the ublk uring_cmd,
>>> and this change is correct because the copy is guaranteed to be done in ublk
>>> daemon context.
>>
>> We good to queue this up then?
> 
> Let me write a v2 implementing Ming's suggestions to use
> copy_page_{to,from}_iter() and get rid of the open-coded bvec
> iteration.

Sounds good.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ