[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <24ba4cd6-0a28-3c22-c5b5-dadaa67600cf@arm.com>
Date: Wed, 16 Aug 2023 17:52:24 +0100
From: Robin Murphy <robin.murphy@....com>
To: "zhangzekun (A)" <zhangzekun11@...wei.com>
Cc: will@...nel.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, john.g.garry@...cle.com,
joro@...tes.org
Subject: Re: [PATCH 2/2] iommu/iova: Manage the depot list size
On 15/08/2023 3:11 pm, zhangzekun (A) wrote:
>
>
> 在 2023/8/15 1:53, Robin Murphy 写道:
>> Automatically scaling the depot up to suit the peak capacity of a
>> workload is all well and good, but it would be nice to have a way to
>> scale it back down again if the workload changes. To that end, add
>> automatic reclaim that will gradually free unused magazines if the
>> depot size remains above a reasonable threshold for long enough.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@....com>
>> ---
>> drivers/iommu/iova.c | 29 +++++++++++++++++++++++++++++
>> 1 file changed, 29 insertions(+)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index d2de6fb0e9f4..76a7d694708e 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -11,6 +11,7 @@
>> #include <linux/smp.h>
>> #include <linux/bitops.h>
>> #include <linux/cpu.h>
>> +#include <linux/workqueue.h>
>> /* The anchor node sits above the top of the usable address space */
>> #define IOVA_ANCHOR ~0UL
>> @@ -626,6 +627,8 @@ EXPORT_SYMBOL_GPL(reserve_iova);
>> */
>> #define IOVA_MAG_SIZE 127
>> +#define IOVA_DEPOT_DELAY msecs_to_jiffies(100)
>> +
>> struct iova_magazine {
>> /*
>> * Only full magazines are inserted into the depot, so we can avoid
>> @@ -646,8 +649,11 @@ struct iova_cpu_rcache {
>> struct iova_rcache {
>> spinlock_t lock;
>> + unsigned int depot_size;
>> struct iova_magazine *depot;
>> struct iova_cpu_rcache __percpu *cpu_rcaches;
>> + struct iova_domain *iovad;
>> + struct delayed_work work;
>> };
>> static struct iova_magazine *iova_magazine_alloc(gfp_t flags)
>> @@ -728,6 +734,7 @@ static struct iova_magazine *iova_depot_pop(struct
>> iova_rcache *rcache)
>> rcache->depot = mag->next;
>> mag->size = IOVA_MAG_SIZE;
>> + rcache->depot_size--;
>> return mag;
>> }
>> @@ -735,6 +742,24 @@ static void iova_depot_push(struct iova_rcache
>> *rcache, struct iova_magazine *ma
>> {
>> mag->next = rcache->depot;
>> rcache->depot = mag;
>> + rcache->depot_size++;
>> +}
>> +
>> +static void iova_depot_work_func(struct work_struct *work)
>> +{
>> + struct iova_rcache *rcache = container_of(work, typeof(*rcache),
>> work.work);
>> + struct iova_magazine *mag = NULL;
>> +
>> + spin_lock(&rcache->lock);
>> + if (rcache->depot_size > num_online_cpus())
>> + mag = iova_depot_pop(rcache);
>> + spin_unlock(&rcache->lock);
>> +
>> + if (mag) {
>> + iova_magazine_free_pfns(mag, rcache->iovad);
>> + iova_magazine_free(mag);
>> + schedule_delayed_work(&rcache->work,
>> msecs_to_jiffies(IOVA_DEPOT_DELAY));
> Hi, Robin,
>
> I am a little confused why IOVA_DEPOT_DELAY need to be calculated twice
> in iova_depot_work_func(), as it already equals to
> "msecs_to_jiffies(100)".
Oof, not sure how I managed to leave a mere 3-line refactoring
half-finished... yeah, this msecs_to_jiffies() just shouldn't be here :)
> Besides, do we really need to invoke a
> delayed_work in iova_depot_work_func()? As each time we put a iova
> magazine to depot, a delayed_work will be invoked which is reponsible to
> free a iova magazine in depot if the depot size is greater than
> num_online_cpus().
The idea is to free excess magazines one at a time at a relatively low
rate, so as not to interfere too much with "bursty" workloads which
might release a large number of IOVAs at once, but then want to
reallocate them again relatively soon. I'm hoping that the overhead of
scheduling the reclaim work unconditionally whenever the depot grows is
sufficiently negligible to avoid having to check the threshold in
multiple places, as that's the part which I anticipate might grow more
complex in future. As far as I could see it should be pretty minimal if
the work is already scheduled, which I'd expect to be the case most of
the time while the depot is busy. The reason the work also reschedules
itself is to handle the opposite situation, and make sure it can run to
completion after the depot goes idle.
Thanks,
Robin.
Powered by blists - more mailing lists