lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 16 Aug 2023 17:52:24 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     "zhangzekun (A)" <zhangzekun11@...wei.com>
Cc:     will@...nel.org, iommu@...ts.linux.dev,
        linux-kernel@...r.kernel.org, john.g.garry@...cle.com,
        joro@...tes.org
Subject: Re: [PATCH 2/2] iommu/iova: Manage the depot list size

On 15/08/2023 3:11 pm, zhangzekun (A) wrote:
> 
> 
> 在 2023/8/15 1:53, Robin Murphy 写道:
>> Automatically scaling the depot up to suit the peak capacity of a
>> workload is all well and good, but it would be nice to have a way to
>> scale it back down again if the workload changes. To that end, add
>> automatic reclaim that will gradually free unused magazines if the
>> depot size remains above a reasonable threshold for long enough.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@....com>
>> ---
>>   drivers/iommu/iova.c | 29 +++++++++++++++++++++++++++++
>>   1 file changed, 29 insertions(+)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index d2de6fb0e9f4..76a7d694708e 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -11,6 +11,7 @@
>>   #include <linux/smp.h>
>>   #include <linux/bitops.h>
>>   #include <linux/cpu.h>
>> +#include <linux/workqueue.h>
>>   /* The anchor node sits above the top of the usable address space */
>>   #define IOVA_ANCHOR    ~0UL
>> @@ -626,6 +627,8 @@ EXPORT_SYMBOL_GPL(reserve_iova);
>>    */
>>   #define IOVA_MAG_SIZE 127
>> +#define IOVA_DEPOT_DELAY msecs_to_jiffies(100)
>> +
>>   struct iova_magazine {
>>       /*
>>        * Only full magazines are inserted into the depot, so we can avoid
>> @@ -646,8 +649,11 @@ struct iova_cpu_rcache {
>>   struct iova_rcache {
>>       spinlock_t lock;
>> +    unsigned int depot_size;
>>       struct iova_magazine *depot;
>>       struct iova_cpu_rcache __percpu *cpu_rcaches;
>> +    struct iova_domain *iovad;
>> +    struct delayed_work work;
>>   };
>>   static struct iova_magazine *iova_magazine_alloc(gfp_t flags)
>> @@ -728,6 +734,7 @@ static struct iova_magazine *iova_depot_pop(struct 
>> iova_rcache *rcache)
>>       rcache->depot = mag->next;
>>       mag->size = IOVA_MAG_SIZE;
>> +    rcache->depot_size--;
>>       return mag;
>>   }
>> @@ -735,6 +742,24 @@ static void iova_depot_push(struct iova_rcache 
>> *rcache, struct iova_magazine *ma
>>   {
>>       mag->next = rcache->depot;
>>       rcache->depot = mag;
>> +    rcache->depot_size++;
>> +}
>> +
>> +static void iova_depot_work_func(struct work_struct *work)
>> +{
>> +    struct iova_rcache *rcache = container_of(work, typeof(*rcache), 
>> work.work);
>> +    struct iova_magazine *mag = NULL;
>> +
>> +    spin_lock(&rcache->lock);
>> +    if (rcache->depot_size > num_online_cpus())
>> +        mag = iova_depot_pop(rcache);
>> +    spin_unlock(&rcache->lock);
>> +
>> +    if (mag) {
>> +        iova_magazine_free_pfns(mag, rcache->iovad);
>> +        iova_magazine_free(mag);
>> +        schedule_delayed_work(&rcache->work, 
>> msecs_to_jiffies(IOVA_DEPOT_DELAY));
> Hi, Robin,
> 
> I am a little confused why IOVA_DEPOT_DELAY need to be calculated twice 
> in iova_depot_work_func(), as it already equals to 
> "msecs_to_jiffies(100)".

Oof, not sure how I managed to leave a mere 3-line refactoring 
half-finished... yeah, this msecs_to_jiffies() just shouldn't be here :)

> Besides, do we really need to invoke a 
> delayed_work in iova_depot_work_func()? As each time we put a iova 
> magazine to depot, a delayed_work will be invoked which is reponsible to 
> free a iova magazine in depot if the depot size is greater than 
> num_online_cpus().

The idea is to free excess magazines one at a time at a relatively low 
rate, so as not to interfere too much with "bursty" workloads which 
might release a large number of IOVAs at once, but then want to 
reallocate them again relatively soon. I'm hoping that the overhead of 
scheduling the reclaim work unconditionally whenever the depot grows is 
sufficiently negligible to avoid having to check the threshold in 
multiple places, as that's the part which I anticipate might grow more 
complex in future. As far as I could see it should be pretty minimal if 
the work is already scheduled, which I'd expect to be the case most of 
the time while the depot is busy. The reason the work also reschedules 
itself is to handle the opposite situation, and make sure it can run to 
completion after the depot goes idle.

Thanks,
Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ