linux-kernel - Re: [PATCHv2 2/2] blk-mq: add support for CPU latency limits

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fa2bc7e5088fd309d846a57edf06520dc83632ba.camel@linux.intel.com>
Date: Wed, 23 Oct 2024 16:26:21 +0300
From: Tero Kristo <tero.kristo@...ux.intel.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: hch@....de, linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2 2/2] blk-mq: add support for CPU latency limits

On Fri, 2024-10-18 at 08:21 -0600, Jens Axboe wrote:
> On 10/18/24 1:30 AM, Tero Kristo wrote:
> > @@ -2700,11 +2701,62 @@ static void blk_mq_plug_issue_direct(struct
> > blk_plug *plug)
> >  static void __blk_mq_flush_plug_list(struct request_queue *q,
> >  				     struct blk_plug *plug)
> >  {
> > +	struct request *req, *next;
> > +	struct blk_mq_hw_ctx *hctx;
> > +	int cpu;
> > +
> >  	if (blk_queue_quiesced(q))
> >  		return;
> > +
> > +	rq_list_for_each_safe(&plug->mq_list, req, next) {
> > +		hctx = req->mq_hctx;
> > +
> > +		if (next && next->mq_hctx == hctx)
> > +			continue;
> > +
> > +		if (q->disk->cpu_lat_limit < 0)
> > +			continue;
> > +
> > +		hctx->last_active = jiffies + msecs_to_jiffies(q-
> > >disk->cpu_lat_timeout);
> > +
> > +		if (!hctx->cpu_lat_limit_active) {
> > +			hctx->cpu_lat_limit_active = true;
> > +			for_each_cpu(cpu, hctx->cpumask) {
> > +				struct dev_pm_qos_request *qos;
> > +
> > +				qos = per_cpu_ptr(hctx-
> > >cpu_lat_qos, cpu);
> > +				dev_pm_qos_add_request(get_cpu_dev
> > ice(cpu), qos,
> > +						      
> > DEV_PM_QOS_RESUME_LATENCY,
> > +						       q->disk-
> > >cpu_lat_limit);
> > +			}
> > +			schedule_delayed_work(&hctx-
> > >cpu_latency_work,
> > +					      msecs_to_jiffies(q-
> > >disk->cpu_lat_timeout));
> > +		}
> > +	}
> > +
> 
> This is, quite literally, and insane amount of cycles to add to the
> hot
> issue path. You're iterating each request in the list, and then each
> CPU
> in the mask of the hardware context for each request.

Ok, I made some optimizations to the code, sending v3 shortly. In this,
all the PM QoS handling and iteration of lists is moved to the
workqueue, and happens in the background. The initial block requests
(until the workqueue fires) may run with higher latency, but that is
most likely an okay compromise.

PS: Please bear with me, my knowledge of the block layer and/or NVMe is
pretty limited. I am sorry if these patches make you frustrated, that
is not my intention.

-Tero

> 
> This just won't fly, not at all. Like the previous feedback, please
> figure out a way to make this cheaper. This means don't iterate a
> bunch
> of stuff.
> 
> Outside of that, lots of styling issues here too, but none of that
> really matters until the base mechanism is at least half way sane.
>