lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f8ff8b69-f4d0-40d4-a2c5-b5bfeb973c71@kernel.dk>
Date: Wed, 23 Oct 2024 07:48:35 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Tero Kristo <tero.kristo@...ux.intel.com>
Cc: hch@....de, linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2 2/2] blk-mq: add support for CPU latency limits

On 10/23/24 7:26 AM, Tero Kristo wrote:
> On Fri, 2024-10-18 at 08:21 -0600, Jens Axboe wrote:
>> On 10/18/24 1:30 AM, Tero Kristo wrote:
>>> @@ -2700,11 +2701,62 @@ static void blk_mq_plug_issue_direct(struct
>>> blk_plug *plug)
>>>  static void __blk_mq_flush_plug_list(struct request_queue *q,
>>>  				     struct blk_plug *plug)
>>>  {
>>> +	struct request *req, *next;
>>> +	struct blk_mq_hw_ctx *hctx;
>>> +	int cpu;
>>> +
>>>  	if (blk_queue_quiesced(q))
>>>  		return;
>>> +
>>> +	rq_list_for_each_safe(&plug->mq_list, req, next) {
>>> +		hctx = req->mq_hctx;
>>> +
>>> +		if (next && next->mq_hctx == hctx)
>>> +			continue;
>>> +
>>> +		if (q->disk->cpu_lat_limit < 0)
>>> +			continue;
>>> +
>>> +		hctx->last_active = jiffies + msecs_to_jiffies(q-
>>>> disk->cpu_lat_timeout);
>>> +
>>> +		if (!hctx->cpu_lat_limit_active) {
>>> +			hctx->cpu_lat_limit_active = true;
>>> +			for_each_cpu(cpu, hctx->cpumask) {
>>> +				struct dev_pm_qos_request *qos;
>>> +
>>> +				qos = per_cpu_ptr(hctx-
>>>> cpu_lat_qos, cpu);
>>> +				dev_pm_qos_add_request(get_cpu_dev
>>> ice(cpu), qos,
>>> +						      
>>> DEV_PM_QOS_RESUME_LATENCY,
>>> +						       q->disk-
>>>> cpu_lat_limit);
>>> +			}
>>> +			schedule_delayed_work(&hctx-
>>>> cpu_latency_work,
>>> +					      msecs_to_jiffies(q-
>>>> disk->cpu_lat_timeout));
>>> +		}
>>> +	}
>>> +
>>
>> This is, quite literally, and insane amount of cycles to add to the
>> hot
>> issue path. You're iterating each request in the list, and then each
>> CPU
>> in the mask of the hardware context for each request.
> 
> Ok, I made some optimizations to the code, sending v3 shortly. In this,
> all the PM QoS handling and iteration of lists is moved to the
> workqueue, and happens in the background. The initial block requests
> (until the workqueue fires) may run with higher latency, but that is
> most likely an okay compromise.
> 
> PS: Please bear with me, my knowledge of the block layer and/or NVMe is
> pretty limited. I am sorry if these patches make you frustrated, that
> is not my intention.

That's fine, but I'd much rather you ask for clarification if there's
something that you don't understand, rather than keep adding really
expensive code to the issue path. Pushing the iteration to the workqueue
indeed sounds like the much better approach.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ