lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPpK+O1g+Z=CYDfA1u6-iBx8L5tuhFOgipbyVvmj=oqW9UbLkg@mail.gmail.com>
Date: Tue, 30 Dec 2025 16:13:14 -0800
From: Randy Jennings <randyj@...estorage.com>
To: Sagi Grimberg <sagi@...mberg.me>
Cc: Mohamed Khalfella <mkhalfella@...estorage.com>, Chaitanya Kulkarni <kch@...dia.com>, 
	Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>, Keith Busch <kbusch@...nel.org>, 
	Aaron Dailey <adailey@...estorage.com>, John Meneghini <jmeneghi@...hat.com>, 
	Hannes Reinecke <hare@...e.de>, linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 10/14] nvme-tcp: Use CCR to recover controller that
 hits an error

On Sat, Dec 27, 2025 at 2:35 AM Sagi Grimberg <sagi@...mberg.me> wrote:
> On 26/11/2025 4:11, Mohamed Khalfella wrote:
...
> > +     dev_info(ctrl->device,
> > +              "CCR failed, switch to time-based recovery, timeout = %ums\n",
> > +              jiffies_to_msecs(rem));
> > +     set_bit(NVME_CTRL_RECOVERED, &ctrl->flags);
> > +     queue_delayed_work(nvme_reset_wq, &to_tcp_ctrl(ctrl)->err_work, rem);
> > +     return -EAGAIN;
>
> I don't think that reusing the same work to handle two completely
> different things
> is the right approach here.
>
> How about splitting to fence_work and err_work? That should eliminate
> some of the
> ctrl state inspections and simplify error recovery.
If the work was independent and could happen separately (probably
in parallel), I could understand having separate work structures.  But they
are not independent, and they have a definite relationship.  Like Mohamed,
I thought of them as different stages of the same work.  Having an extra
work item takes up more space (I would be concerned about scalability to
thousands or 10s of thousands of associations and then go one order of
magnitude higher for margin), and it also causes a connection object
(referenced during IO) to take up more cache lines.  Is it worth taking up
that space, when the separate work items would be different, dependent
stages in the same process?

Sincerely,
Randy Jennings

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ