linux-kernel - Re: [PATCH v4 1/1] nvme: multipath: Implemented new iopolicy "queue-depth"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zk4ddPmottdOJND1@kbusch-mbp.dhcp.thefacebook.com>
Date: Wed, 22 May 2024 10:29:40 -0600
From: Keith Busch <kbusch@...nel.org>
To: John Meneghini <jmeneghi@...hat.com>
Cc: hch@....de, sagi@...mberg.me, emilne@...hat.com,
	linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
	jrani@...estorage.com, randyj@...estorage.com, hare@...nel.org
Subject: Re: [PATCH v4 1/1] nvme: multipath: Implemented new iopolicy
 "queue-depth"

On Wed, May 22, 2024 at 12:23:51PM -0400, John Meneghini wrote:
> On 5/22/24 11:56, Keith Busch wrote:
> > On Wed, May 22, 2024 at 11:42:12AM -0400, John Meneghini wrote:
> > > +static void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys, int iopolicy)
> > > +{
> > > +	struct nvme_ctrl *ctrl;
> > > +	int old_iopolicy = READ_ONCE(subsys->iopolicy);
> > > +
> > > +	WRITE_ONCE(subsys->iopolicy, iopolicy);
> > > +
> > > +	/* iopolicy changes reset the counters and clear the mpath by design */
> > > +	mutex_lock(&nvme_subsystems_lock);
> > > +	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
> > > +		atomic_set(&ctrl->nr_active, 0);
> > 
> > Can you me understand why this is a desirable feature? Unless you
> > quiesce everything at some point, you'll always have more unaccounted
> > requests on whichever path has higher latency. That sounds like it
> > defeats the goals of this io policy.
> 
> This is true. And as a matter of practice I never change the IO policy when IOs are in flight.  I always stop the IO first.
> But we can't stop any user from changing the IO policy again and again.  So I'm not sure what to do.
> 
> If you'd like I add the 'if (old_iopolicy == iopolicy) return;' here, but
> that's not going to solve the problem of inaccurate counters when users
> start flipping io policies around. with IO inflight. There is no
> synchronization between io submission across controllers and changing the
> policy so I expect changing between round-robin and queue-depth with IO
> inflight suffers from the same problem... though not as badly.
> 
> I'd rather take this patch now and figure out how to fix the problem with
> another patch in the future.  Maybe we can check the io stats and refuse to
> change the policy of they are not zero....

The idea of tagging the nvme_req()->flags on submission means the
completion handling the nr_active counter is symmetric with the
submission side: you don't ever need to reset nr_active because
everything is accounted for.