linux-kernel - Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADtkEef69h+Asg+J_EeOkZhmPBtnTnV2EaytfCifxjo41TW-=w@mail.gmail.com>
Date:   Mon, 12 Jun 2023 16:24:20 +0800
From:   许春光 <brookxu.cn@...il.com>
To:     Ming Lei <ming.lei@...hat.com>
Cc:     Sagi Grimberg <sagi@...mberg.me>, kbusch@...nel.org,
        axboe@...nel.dk, hch@....de, linux-nvme@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

Ming Lei <ming.lei@...hat.com> 于2023年6月12日周一 09:33写道：
>
> On Sun, Jun 11, 2023 at 11:11:06AM +0300, Sagi Grimberg wrote:
> >
> > > > Hi Ming:
> > > >
> > > > Ming Lei <ming.lei@...hat.com> 于2023年6月6日周二 23:15写道：
> > > > >
> > > > > Hello Chunguang,
> > > > >
> > > > > On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> > > > > > From: Chunguang Xu <chunguang.xu@...pee.com>
> > > > > >
> > > > > > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > > > > > while removing ctrl. The root cause may due to the state of ctrl changed to
> > > > > > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > > > > > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove().  At this time, ctrl is
> > > > >
> > > > > I didn't dig into ctrl state check in these error handler yet, but error
> > > > > handling is supposed to provide forward progress for any controller state.
> > > > >
> > > > > Can you explain a bit how switching to DELETING interrupts the above
> > > > > error handling and breaks the forward progress guarantee?
> > > >
> > > > Here we freezed ctrl, if ctrl state has changed to DELETING or
> > > > DELETING_NIO(by nvme disconnect),  we will break up and lease ctrl
> > > > freeze, so nvme_remove_namespaces() hang.
> > > >
> > > > static void nvme_tcp_error_recovery_work(struct work_struct *work)
> > > > {
> > > >          ...
> > > >          if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
> > > >                  /* state change failure is ok if we started ctrl delete */
> > > >                  WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
> > > >                               ctrl->state != NVME_CTRL_DELETING_NOIO);
> > > >                  return;
> > > >          }
> > > >
> > > >          nvme_tcp_reconnect_or_remove(ctrl);
> > > > }
> > > >
> > > >
> > > > Another path, we will check ctrl state while reconnecting, if it changes to
> > > > DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
> > > > queue quiescing (through reset path), as a result Hang occurs.
> > > >
> > > > static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> > > > {
> > > >          /* If we are resetting/deleting then do nothing */
> > > >          if (ctrl->state != NVME_CTRL_CONNECTING) {
> > > >                  WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
> > > >                          ctrl->state == NVME_CTRL_LIVE);
> > > >                  return;
> > > >          }
> > > >          ...
> > > > }
> > > >
> > > > > > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > > > > > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > > > > > hang in flush_work(&ctrl->scan_work).
> > > > > >
> > > > > > After analyzation, we found that there are mainly two case:
> > > > > > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > > > > >     new IO to load partition table.
> > > > >
> > > > > Yeah, nvme freeze usage is fragile, and I suggested to move
> > > > > nvme_start_freeze() from nvme_tcp_teardown_io_queues to
> > > > > nvme_tcp_configure_io_queues(), such as the posted change on rdma:
> > > > >
> > > > > https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
> > > >
> > > > While drive reconnecting, I think we should freeze ctrl or quiescing queue,
> > > > otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
> > > > and the IOs may retry frequently. So I think we may better freeze ctrl
> > > > while entering
> > > > error_recovery/reconnect, but need to unfreeze it while exit.
> > >
> > > quiescing is always done in error handling, and freeze is actually
> > > not a must, and it is easier to cause race by calling freeze & unfreeze
> > > from different contexts.
> > >
> > > But yes, unquiesce should have been done after exiting error handling, or
> > > simply do it in nvme_unquiesce_io_queues().
> > >
> > > And the following patch should cover all these hangs:
> > >
> >
> > Ming, are you sending a formal patchset for this?
>
> Not yet, will do it.

Hi Ming:

Please cc me, thx.

> >
> > >
> > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > index 3ec38e2b9173..83d3818fc60b 100644
> > > --- a/drivers/nvme/host/core.c
> > > +++ b/drivers/nvme/host/core.c
> > > @@ -4692,6 +4692,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> > >      */
> > >     nvme_mpath_clear_ctrl_paths(ctrl);
> > > +   /* unquiesce io queues so scan work won't hang */
> > > +   nvme_unquiesce_io_queues(ctrl);
> >
> > What guarantees that the queues won't be quiesced right after this
> > by the transport?
>
> Please see nvme_change_ctrl_state(), if controller state is in
> DELETING, new NVME_CTRL_RESETTING/NVME_CTRL_CONNECTING can be entered
> any more.
>
> >
> > I'm still unclear why this affects the scan_work?
>
> As Chunguang mentioned, if error recover is terminated by nvme deletion,
> the controller can be kept in quiesced state, then in-queue IOs can'tu
> move on, meantime new error recovery can't be started successfully because
> controller state is NVME_CTRL_DELETING, so any pending IOs(include those
> from scan context) can't be completed.
>
>
>
>
> Thanks,
> Ming
>