[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b9417479-4e45-8b31-bc32-d20739c97a10@huawei.com>
Date: Thu, 18 Mar 2021 16:07:11 +0000
From: John Garry <john.garry@...wei.com>
To: Robin Murphy <robin.murphy@....com>, <joro@...tes.org>
CC: <vjitta@...eaurora.org>, <iommu@...ts.linux-foundation.org>,
Linuxarm <linuxarm@...wei.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] iommu/iova: Improve restart logic
>
> Well yeah, in your particular case you're allocating from a heavily
> over-contended address space, so much of the time it is genuinely full.
> Plus you're primarily churning one or two sizes of IOVA, so there's a
> high chance that you will either allocate immediately from the cached
> node (after a previous free), or search the whole space and fail. In
> case it was missed, searching only some arbitrary subset of the space
> before giving up is not a good behaviour for an allocator to have in
> general.
>
>>> So since the retry means that we search through the complete pfn
>>> range most of the time (due to poor success rate), we should be able
>>> to do a better job at maintaining an accurate max alloc size, by
>>> calculating it from the range search, and not relying on max alloc
>>> failed or resetting it frequently. Hopefully that would mean that
>>> we're smarter about not trying the allocation.
>>
>> So I tried that out and we seem to be able to scrap back an
>> appreciable amount of performance. Maybe 80% of original, with with
>> another change, below.
>
> TBH if you really want to make allocation more efficient I think there
> are more radical changes that would be worth experimenting with, like
> using some form of augmented rbtree to also encode the amount of free
> space under each branch, or representing the free space in its own
> parallel tree, or whether some other structure entirely might be a
> better bet these days.
>
> And if you just want to make your thing acceptably fast, now I'm going
> to say stick a quirk somewhere to force the "forcedac" option on your
> platform ;)
>
Easier said than done :)
But still, I'd like to just be able to cache all IOVA sizes for my DMA
engine, so we should not have to go near the RB tree often.
I have put together a series to allow upper limit of rcache range be
increased per domain. So naturally that gives better performance than we
originally had.
I don't want to prejudice the solution by saying what I think of it now,
so will send it out...
> [...]
>>>>> @@ -219,7 +256,7 @@ static int __alloc_and_insert_iova_range(struct
>>>>> iova_domain *iovad,
>>>>> if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
>>>>> high_pfn = limit_pfn;
>>>>> low_pfn = retry_pfn;
>>>>> - curr = &iovad->anchor.node;
>>>>> + curr = iova_find_limit(iovad, limit_pfn);
>>
>>
>> I see that it is now applied. However, alternatively could we just add
>> a zero-length 32b boundary marker node for the 32b pfn restart point?
>
> That would need special cases all over the place to prevent the marker
> getting merged into reservations or hit by lookups, and at worst break
> the ordering of the tree if a legitimate node straddles the boundary. I
> did consider having the insert/delete routines keep track of yet another
> cached node for whatever's currently the first thing above the 32-bit
> boundary, but I was worried that might be a bit too invasive.
Yeah, I did think of that. I don't think that it would have too much
overhead.
>
> FWIW I'm currently planning to come back to this again when I have a bit
> more time, since the optimum thing to do (modulo replacing the entire
> algorithm...) is actually to make the second part of the search
> *upwards* from the cached node to the limit. Furthermore, to revive my
> arch/arm conversion I think we're realistically going to need a
> compatibility option for bottom-up allocation to avoid too many nasty
> surprises, so I'd like to generalise things to tackle both concerns at
> once.
>
Thanks,
John
Powered by blists - more mailing lists