[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7f7daf42-8aff-b9ed-0f48-d4158896012e@huawei.com>
Date: Wed, 24 Nov 2021 17:21:50 +0000
From: John Garry <john.garry@...wei.com>
To: Robin Murphy <robin.murphy@....com>, <joro@...tes.org>,
<will@...nel.org>
CC: <iommu@...ts.linux-foundation.org>,
<suravee.suthikulpanit@....com>, <baolu.lu@...ux.intel.com>,
<willy@...radead.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/9] iommu: Refactor flush queues into iommu-dma
On 23/11/2021 14:10, Robin Murphy wrote:
> As promised, this series cleans up the flush queue code and streamlines
> it directly into iommu-dma. Since we no longer have per-driver DMA ops
> implementations, a lot of the abstraction is now no longer necessary, so
> there's a nice degree of simplification in the process. Un-abstracting
> the queued page freeing mechanism is also the perfect opportunity to
> revise which struct page fields we use so we can be better-behaved
> from the MM point of view, thanks to Matthew.
>
> These changes should also make it viable to start using the gather
> freelist in io-pgtable-arm, and eliminate some more synchronous
> invalidations from the normal flow there, but that is proving to need a
> bit more careful thought than I have time for in this cycle, so I've
> parked that again for now and will revisit it in the new year.
>
> For convenience, branch at:
> https://gitlab.arm.com/linux-arm/linux-rm/-/tree/iommu/iova
>
> I've build-tested for x86_64, and boot-tested arm64 to the point of
> confirming that put_pages_list() gets passed a valid empty list when
> flushing, while everything else still works.
My interest is in patches 2, 3, 7, 8, 9, and they look ok. I did a bit
of testing for strict and non-strict mode on my arm64 system and no
problems.
Apart from this, I noticed that one possible optimization could be to
avoid so many reads of fq_flush_finish_cnt, as we seem to have a pattern
of fq_flush_iotlb()->atomic64_inc(fq_flush_finish_cnt) followed by a
read of fq_flush_finish_cnt in fq_ring_free(), so we could use
atomic64_inc_return(fq_flush_finish_cnt) and reuse the value. I think
that any racing in fq_flush_finish_cnt accesses are latent, but maybe
there is a flaw in this. However I tried something along these lines and
got a 2.4% throughput gain for my storage scenario.
Thanks,
John
Powered by blists - more mailing lists