[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ecb3725c-27c4-944b-b42c-f4e293521f94@arm.com>
Date: Wed, 8 May 2019 19:31:18 +0100
From: Jean-Philippe Brucker <jean-philippe.brucker@....com>
To: Robin Murphy <robin.murphy@....com>,
Eric Auger <eric.auger@...hat.com>, eric.auger.pro@...il.com,
iommu@...ts.linux-foundation.org, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, kvmarm@...ts.cs.columbia.edu, joro@...tes.org,
alex.williamson@...hat.com, jacob.jun.pan@...ux.intel.com,
yi.l.liu@...el.com, will.deacon@....com
Cc: peter.maydell@...aro.org, kevin.tian@...el.com,
vincent.stehle@....com, ashok.raj@...el.com, marc.zyngier@....com,
christoffer.dall@....com
Subject: Re: [PATCH v7 11/23] iommu/arm-smmu-v3: Maintain a SID->device
structure
On 08/05/2019 15:05, Robin Murphy wrote:
> On 08/04/2019 13:18, Eric Auger wrote:
>> From: Jean-Philippe Brucker <jean-philippe.brucker@....com>
>>
>> When handling faults from the event or PRI queue, we need to find the
>> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>
> Out of curiosity, have you looked at whether an xarray might now be a
> more efficient option for this?
I hadn't looked into it yet, but it's a welcome distraction.
* Searching by SID will be more efficient with xarray (which still is a
radix tree, with a better API). Rather than O(log2(n)) we walk
O(log_c(n)) nodes in the worst case, with c = XA_CHUNK_SIZE = 64. We
don't care about insertion/deletion time.
* Memory consumption is worse than rb-tree, when the SID space is a
little sparse. For PCI devices the three LSBs (function number) might
not be in use, meaning that 88% of the leaf slots would be unused. And
it gets worse if the system has lots of bridges, as each bus number
requires its own xa slot, ie. 98% unused.
It's not too bad though, and in general I think the distribution of
SIDs would be good enough to justify using xarray. Plugging in more
devices would increase the memory consumption fast, but creating virtual
functions wouldn't. On one machine (TX2, a few discrete PCI cards) I
need 16 xa slots to store 42 device IDs. That's 16 * 576 bytes = 9 kB,
versus 42 * 40 bytes = 1.6 kB for the rb-tree. On another machine (x86,
lots of RC integrated endpoints) I need 18 slots to store 181 device
IDs, 10 kB vs. 7 kB with the rb-tree.
* Using xa would make this code a lot nicer.
Shame that we can't store the device pointer directly in the STE though,
there is already plenty of unused space in there.
Thanks,
Jean
Powered by blists - more mailing lists