linux-kernel - Re: Regarding patch "block/blk-mq: Don't complete locally if capacities are different"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <66912a22-540d-4b9a-bd06-cce55b9ad304@quicinc.com>
Date: Fri, 23 Aug 2024 13:27:54 +0530
From: MANISH PANDEY <quic_mapa@...cinc.com>
To: Bart Van Assche <bvanassche@....org>,
        Sandeep Dhavale
	<dhavale@...gle.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>
CC: Qais Yousef <qyousef@...alina.io>,
        Christian Loehle
	<christian.loehle@....com>, <axboe@...nel.dk>,
        <mingo@...nel.org>, <peterz@...radead.org>,
        <vincent.guittot@...aro.org>, <linux-block@...r.kernel.org>,
        <sudeep.holla@....com>, Jaegeuk Kim
	<jaegeuk@...nel.org>,
        Christoph Hellwig <hch@...radead.org>, <kailash@...gle.com>,
        <tkjos@...gle.com>, <bvanassche@...gle.com>,
        <quic_nitirawa@...cinc.com>, <quic_cang@...cinc.com>,
        <quic_rampraka@...cinc.com>, <quic_narepall@...cinc.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Regarding patch "block/blk-mq: Don't complete locally if
 capacities are different"

On 8/22/2024 7:54 PM, Bart Van Assche wrote:
> On 8/22/24 3:46 AM, MANISH PANDEY wrote:
>> On 8/21/2024 10:52 PM, Bart Van Assche wrote:
>> > What is the performance impact of the above change?
>  >
>> No impact at all
> Is this a good summary of this email thread?
> * The first email in this thread reports an important performance
>    regression.
> * In your previous email there is a candidate fix for the performance
>    regression.
> * Above I read that the proposed fix has no performance impact at all
>    on any setup.
> 
> Is this a good summary of this email thread? If so, do you agree that
> this must be confusing everyone who is following this email thread?
> 
> Thanks,
> 
> Bart.

Hi Bart,

Performance impact due to addition of cpu capacity check 
(https://lore.kernel.org/all/20240223155749.2958009-3-qyousef@layalina.io/) 
...[1]
is already mentioned in the first email.

But let me summarize it again:

We are not able to get advantage of affining the IRQ in different 
capacity CPU(s)/clusters and complete the request in higher cluster 
cpu(s), even though the LLC is shared between these clusters as it is 
causing the block completion to happen on SOFTIRQ context, if requester 
and completion clusters are different.

Below is the performance impact with the current patch [1]

1. For MCQ capable UFS host (paired with UFS 4.x), we are observing ~20% 
random R/W performance drop.

2. For single doorbell ufs hosts (paired with UFS 2.x/ UFS 3.x), we are 
observing ~7-10% random R/W performance drop.

Also in previous emails on this thread, below were few suggestions to 
add check for equal or greater capacity cpus by @Christian Loehle
https://lore.kernel.org/all/3feb5226-7872-432b-9781-29903979d34a@arm.com/

 > From: Christian Loehle @ 2024-08-02  9:03 UTC (permalink / raw)
 > [......]
 > diff --git a/block/blk-mq.c b/block/blk-mq.c
 > index e3c3c0c21b55..a4a2500c4ef6 100644
 > --- a/block/blk-mq.c
 > +++ b/block/blk-mq.c
 > @@ -1164,7 +1164,7 @@ static inline bool
 > blk_mq_complete_need_ipi(struct request *rq)
 >        if (cpu == rq->mq_ctx->cpu ||
 >            (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) &&
 >             cpus_share_cache(cpu, rq->mq_ctx->cpu) &&
 > -            cpus_equal_capacity(cpu, rq->mq_ctx->cpu)))
 > +            arch_scale_cpu_capacity(cpu) >=
 >              arch_scale_cpu_capacity(rq->mq_ctx->cpu)))
 >                return false;
 >
 >       /* don't try to IPI to an offline CPU */

There can be SoCs with different CPU cluster configurations and to have 
optimal IO load balancing or to avoid contention b/w submission path and 
completion path, we may need to complete IO request of large capacity 
CPU(s) on small cluster cpus. So the above propose solution may not be 
suffice to all the use cases.

Hence with below proposed solution, we are trying to propose a new rq 
flag QUEUE_FLAG_CPU_CAPACITY. The proposed solution will provide us a 
way such that users who are benefited with CPU capacity check [1] would 
be able to use the fix as it is, and if a user (including us) want to 
bypass cpu capacity fix [1], they can set rq_affinity to 3 and would be 
able to retain performance drop as mentioned in first email. This would 
give flexibility to user to choose what's the best for their system.

if (cpu == rq->mq_ctx->cpu ||
	    (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) &&
	     cpus_share_cache(cpu, rq->mq_ctx->cpu) &&
+	     (test_bit(QUEUE_FLAG_CPU_CAPACITY, &rq->q->queue_flags) ||
	      cpus_equal_capacity(cpu, rq->mq_ctx->cpu))))
		return false;

Regards
Manish