lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240801160537.ux4eg6p42disuqur@airbuntu>
Date: Thu, 1 Aug 2024 17:05:37 +0100
From: Qais Yousef <qyousef@...alina.io>
To: MANISH PANDEY <quic_mapa@...cinc.com>
Cc: axboe@...nel.dk, mingo@...nel.org, peterz@...radead.org,
	vincent.guittot@...aro.org, dietmar.eggemann@....com,
	linux-block@...r.kernel.org, sudeep.holla@....com,
	Jaegeuk Kim <jaegeuk@...nel.org>,
	Bart Van Assche <bvanassche@....org>,
	Christoph Hellwig <hch@...radead.org>, kailash@...gle.com,
	tkjos@...gle.com, dhavale@...gle.com, bvanassche@...gle.com,
	quic_nitirawa@...cinc.com, quic_cang@...cinc.com,
	quic_rampraka@...cinc.com, quic_narepall@...cinc.com,
	linux-kernel@...r.kernel.org
Subject: Re: Regarding patch "block/blk-mq: Don't complete locally if
 capacities are different"

On 08/01/24 14:55, MANISH PANDEY wrote:
> ++ adding linux-kernel group
> 
> On 7/31/2024 7:16 PM, MANISH PANDEY wrote:
> > Hi Qais Yousef,
> > Recently we observed below patch has been merged
> > https://lore.kernel.org/all/20240223155749.2958009-3-qyousef@layalina.io
> > 
> > This patch is causing performance degradation ~20% in Random IO along
> > with significant drop in Sequential IO performance. So we would like to
> > revert this patch as it impacts MCQ UFS devices heavily. Though Non MCQ
> > devices are also getting impacted due to this.

Could you provide more info about your systems' topology and irq setup please?

> > 
> > We have several concerns with the patch
> > 1. This patch takes away the luxury of affining best possible cpus from
> >   device drivers and limits driver to fall in same group of CPUs.

I don't think it does. If rq_affinity is set to 1, then it is set to match the
performance of the requester to do the completion.

> > 
> > 2. Why can't device driver use irq affinity to use desired CPUs to
> > complete the IO request, instead of forcing it from block layer.

If you set the sysfs rq_affinity to 0, you should be able to distribute things
without block layer trying to match anything?

> > 
> > 3. Already CPUs are grouped based on LLC, then if a new categorization
> > is required ?

rq_affinity = 1 is asking to match the performance of the
requester/orginitaor. On HMP system, this means looking at grouping based on
capacity, not just LLC.

> > 
> > > big performance impact if the IO request
> > > was done from a CPU with higher capacity but the interrupt is serviced
> > > on a lower capacity CPU.
> > 
> > This patch doesn't considers the issue of contention in submission path

What is the issue of contention? Could you please explain it in more details?

> > and completion path. Also what if we want to complete the request of
> > smaller capacity CPU to Higher capacity CPU?

Assuming you want to use rq_affinity = 1 to match based on LLC but not
capacity. I'm curious why you want to match on LLC only but not capacity.

Is this on 6.1 kernel and beyond? Arm systems should have a shared LLC based on
DSU that is enforced in topology in 6.1. So can't see why you want to match
based on LLC but not capacity. Could you provide more info please?

If you don't want block layer to do any affinity, isn't it better to set
rq_affinity = 0?

> > Shouldn't a device driver take care of this and allow the vendors to use
> > the best possible combination they want to use?
> > Does it considers MCQ devices and different SQ<->CQ mappings?

Why this consideration matters when matching the perf based on capacity but it
doesn't matter when matching it based on LLC?

> > 
> > > Without the patch I see the BLOCK softirq always running on little cores
> > > (where the hardirq is serviced). With it I can see it running on all
> > > cores.
> > 
> > why we can't use echo 2 > rq_affinity to force complete on the same
> > group of CPUs from where request was initiated?

They are not the samae. rq_affinity = 1 means match. So if both are on the same
LLC or capacity, no need to force anything. rq_affinity = 2 means always match.
It's not the same thing, is it?


Thanks

--
Qais Yousef

> > Also why to force vendors to always use SOFTIRQ for completion?
> > We should be flexible to either complete the IO request via IPI, HARDIRQ
> > or SOFTIRQ.
> > 
> > 
> > An SoC can have different CPU configuration possible and this patch
> > forces a restriction on the completion path. This problem is more worse
> > in MCQ devices as we can have different SQ<->CQ mapping.
> > 
> > So we would like to revert the patch. Please let us know if any concerns?
> > 
> > Regards
> > Manish Pandey

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ