linux-kernel - Re: [PATCH v2 1/2] io_uring: Add new functions to handle user fault scenarios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANHzP_uW4+-M1yTg-GPdPzYWAmvqP5vh6+s1uBhrMZ3eBusLug@mail.gmail.com>
Date: Wed, 23 Apr 2025 01:04:53 +0800
From: 姜智伟 <qq282012236@...il.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz, 
	akpm@...ux-foundation.org, peterx@...hat.com, asml.silence@...il.com, 
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, io-uring@...r.kernel.org
Subject: Re: [PATCH v2 1/2] io_uring: Add new functions to handle user fault scenarios

On Wed, Apr 23, 2025 at 12:32 AM Jens Axboe <axboe@...nel.dk> wrote:
>
> On 4/22/25 10:29 AM, Zhiwei Jiang wrote:
> > diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h
> > index d4fb2940e435..8567a9c819db 100644
> > --- a/io_uring/io-wq.h
> > +++ b/io_uring/io-wq.h
> > @@ -70,8 +70,10 @@ enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
> >                                       void *data, bool cancel_all);
> >
> >  #if defined(CONFIG_IO_WQ)
> > -extern void io_wq_worker_sleeping(struct task_struct *);
> > -extern void io_wq_worker_running(struct task_struct *);
> > +extern void io_wq_worker_sleeping(struct task_struct *tsk);
> > +extern void io_wq_worker_running(struct task_struct *tsk);
> > +extern void set_userfault_flag_for_ioworker(void);
> > +extern void clear_userfault_flag_for_ioworker(void);
> >  #else
> >  static inline void io_wq_worker_sleeping(struct task_struct *tsk)
> >  {
> > @@ -79,6 +81,12 @@ static inline void io_wq_worker_sleeping(struct task_struct *tsk)
> >  static inline void io_wq_worker_running(struct task_struct *tsk)
> >  {
> >  }
> > +static inline void set_userfault_flag_for_ioworker(void)
> > +{
> > +}
> > +static inline void clear_userfault_flag_for_ioworker(void)
> > +{
> > +}
> >  #endif
> >
> >  static inline bool io_wq_current_is_worker(void)
>
> This should go in include/linux/io_uring.h and then userfaultfd would
> not have to include io_uring private headers.
>
> But that's beside the point, like I said we still need to get to the
> bottom of what is going on here first, rather than try and paper around
> it. So please don't post more versions of this before we have that
> understanding.
>
> See previous emails on 6.8 and other kernel versions.
>
> --
> Jens Axboe
The issue did not involve creating new worker processes. Instead, the
existing IOU worker kernel threads (about a dozen) associated with the VM
process were fully utilizing CPU without writing data, caused by a fault
while reading user data pages in the fault_in_iov_iter_readable function
when pulling user memory into kernel space.

This issue occurs like during VM snapshot loading (which uses
userfaultfd for on-demand memory loading), while the task in the guest is
writing data to disk.

Normally, the VM first triggers a user fault to fill the page table.
So in the IOU worker thread, the page tables are already filled,
fault no chance happens when faulting in memory pages
in fault_in_iov_iter_readable.

I suspect that during snapshot loading, a memory access in the
VM triggers an async page fault handled by the kernel thread,
while the IOU worker's async kernel thread is also running.
Maybe If the IOU worker's thread is scheduled first.
I’m going to bed now.