lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACycT3vYt0XNV2GdjKjDS1iyWieY_OV4h=W1qqk_AAAahRZowA@mail.gmail.com>
Date:   Tue, 4 Jan 2022 13:31:47 +0800
From:   Yongji Xie <xieyongji@...edance.com>
To:     Josef Bacik <josef@...icpanda.com>
Cc:     Christoph Hellwig <hch@...radead.org>,
        Jens Axboe <axboe@...nel.dk>,
        Bart Van Assche <bvanassche@....org>,
        linux-block@...r.kernel.org, nbd@...er.debian.org,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] nbd: Don't use workqueue to handle recv work

On Tue, Jan 4, 2022 at 12:10 AM Josef Bacik <josef@...icpanda.com> wrote:
>
> On Thu, Dec 30, 2021 at 12:01:23PM +0800, Yongji Xie wrote:
> > On Thu, Dec 30, 2021 at 1:35 AM Christoph Hellwig <hch@...radead.org> wrote:
> > >
> > > On Mon, Dec 27, 2021 at 05:12:41PM +0800, Xie Yongji wrote:
> > > > The rescuer thread might take over the works queued on
> > > > the workqueue when the worker thread creation timed out.
> > > > If this happens, we have no chance to create multiple
> > > > recv threads which causes I/O hung on this nbd device.
> > >
> > > If a workqueue is used there aren't really 'receive threads'.
> > > What is the deadlock here?
> >
> > We might have multiple recv works, and those recv works won't quit
> > unless the socket is closed. If the rescuer thread takes over those
> > works, only the first recv work can run. The I/O needed to be handled
> > in other recv works would be hung since no thread can handle them.
> >
>
> I'm not following this explanation.  What is the rescuer thread you're talking

https://www.kernel.org/doc/html/latest/core-api/workqueue.html#c.rescuer_thread

> about?  If there's an error we close the socket which will error out the recvmsg
> which will make the recv workqueue close down.

When to close the socket? The nbd daemon doesn't know what happens in
the kernel.

>
> > In that case, we can see below stacks in rescuer thread:
> >
> > __schedule
> >   schedule
> >     scheule_timeout
> >       unix_stream_read_generic
> >         unix_stream_recvmsg
> >           sock_xmit
> >             nbd_read_stat
> >               recv_work
> >                 process_one_work
> >                   rescuer_thread
> >                     kthread
> >                       ret_from_fork
>
> This is just the thing hanging waiting for an incoming request, so this doesn't
> tell me anything.  Thanks,
>

The point is the *recv_work* is handled in the *rescuer_thread*.
Normally it should be handled in *work_thread* like:

__schedule
   schedule
     scheule_timeout
       unix_stream_read_generic
         unix_stream_recvmsg
           sock_xmit
             nbd_read_stat
               recv_work
                 process_one_work
                   *work_thread*
                     kthread
                       ret_from_fork

Thanks,
Yongji

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ