[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8404e7ef-d929-bd39-a10d-f4053cc4bd3e@oracle.com>
Date: Tue, 18 Apr 2023 19:50:06 +0100
From: John Garry <john.g.garry@...cle.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Vasant Hegde <vasant.hegde@....com>,
Robin Murphy <robin.murphy@....com>, joro@...tes.org,
will@...nel.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH v4] iommu: Optimise PCI SAC address trick
On 18/04/2023 18:36, Linus Torvalds wrote:
>> JFYI, Since you are using NVMe, you could also alternatively try
>> something like which I did for some SCSI storage controller drivers to
>> limit the request_queue max_sectors soft limit, like:
> That patch is not only whitespace-damaged, it's randomly missing one
> '+' character
My copy and paste error
> so it makes no sense even ignoring the whitespace
> problems._and_ it has a nonsensical cast to 'unsigned int' which
> makes that 'min()' possibly do crazy and invalid things (ie imagine
> dma_opt_mapping_size() returning 4GB).
>
> You can't cast things to the smaller size just to get rid of a
> warning, for chrissake!
Yeah, sorry, I was just trying to show a very quick demo of how this can
actually be done.
Indeed, I could have mentioned that it would actually have been easier
to test by feeding a lower limit into /sys/block/<dev>/queue/max_sectors_kb
>
> In fact, even without the cast, it seems entirely broken, since the
> fallback for dma_opt_mapping_size() is to return 0 (admittedly_that_
> case only happens with HAS_DMA=n).
>
> Finally, doing this inside the
>
> if (ctrl->max_hw_sectors) {
I think that this would be set for PCI NVMe controllers, which we were
interested in here. But, indeed, I could check for a better place to set
this.
>
> conditional seems entirely wrong, since any dma mapping limits would
> be entirely independent of any driver maximum hw size, and in fact
> *easier* to hit if the block device itself doesn't have any max
> limits.
>
> So please burn that patch in the darkest pits of hell and let's try to
> forget it ever existed. Ok?
Sure
>
> Also, shouldn't any possible dma mapping size affect not
> 'max_sectors', but 'max_segment_size'? At least the docs imply that
> dma_opt_mapping_size() is about the max size of a_single_ mapping,
> not of the whole thing?
It's meant to apply to total mapping length and not a single segment, so
then the doc would be misleading.
>
> Anyway, if this is actually an issue, to the point that it's now being
> discussed for a_second_ block driver subsystem, then shouldn't the
> queue handling just do this all automatically, instead of adding
> random crap to random block driver architectures?
Other storage controllers may enjoy better performance with very large
DMA mappings (whose total length exceed the IOVA caching limit), so it
was too risky to apply a performance-related change of this nature
across the board when that API was introduced.
So far it had only been a single controller where we were actually
seeing the issue of alloc'ing IOVAs giving very (very) poor performance.
However, as far as I am aware, there was nothing special about that
controller, apart from the fact that it was often creating requests
whose length exceeded that IOVA caching limit, and it also filling the
32b IOVA space quickly - that may be because the system had lots of CPUs.
Since there are now reports of poor performance in other storage
controllers and also in networking adapters, I can only assume that
people are testing more often with IOMMU-enabled systems with lots of
CPUs. Having said that, I would still be cautious of applying that limit
everywhere.
>
> And no, I don't know this code, so maybe I'm entirely missing
> something, but that patch just raised my hackles enough that I had to
> say something.
Sure.
Thanks,
John
Powered by blists - more mailing lists