lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a1e46385-0712-a416-56e6-e77a18604242@arm.com>
Date:   Fri, 23 Sep 2022 13:27:20 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     "brookxu.cn@...il.com" <brookxu.cn@...il.com>, joro@...tes.org,
        will@...nel.org
Cc:     iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] iommu/iova: using separate rcache for SAC and DAC

On 2022-09-19 04:02, brookxu.cn@...il.com wrote:
> 
> 
> 在 2022/9/17 1:03, Robin Murphy 写道:
>> On 2022-09-16 16:46, brookxu.cn wrote:
>>> From: Chunguang Xu <chunguang.xu@...pee.com>
>>>
>>> While iommu_dma_forcedac disable, for PCI device kernel
>>> try SAC first, if failed then try DAC. Since now rcache
>>> does not distinguish SAC and DAC, if all PFNs contained
>>> in cpu loaded cache is larger than SAC max PFN, but the
>>> SAC address space is sufficient, as cpu loaded cached is
>>> not empty, kernel will iova_alloc () to alloc IOVA. For
>>> PCI device, kernel alloc SAC most, loaded cache may
>>> invalid for SAC alloc for a long time, kernel will enter
>>> alloc_iova() slow path frequencely, as result performance
>>> is degrade. To circumvent this problem, SAC and DAC maybe
>>> better to use separate caches.
>>
>> I really dislike the idea of doubling the already significantly large 
>> footprint of rcaches purely to optimise for the stupid case. If you've 
>> got to the point of hitting contention in the SAC path, you presumably 
>> don't have broken hardware/drivers so you're infinitely better off 
>> using forcedac and avoiding it entirely to begin with. And frankly 
>> even if you *do* have broken hardware, if you care about performance 
>> you're still better off fixing the relevant driver(s) to set correct 
>> DMA masks so you can use forcedac.
> 
> Some of our NICs have poor performance after heavy traffic impact. We 
> found that it maybe caused by a large number of 1-2 order IOVA SAC 
> address applications to make SAC address space fragmentation during 
> taffic impact, which in turn causes some 3 order IOVA SAC address 
> applications to fail. Kernel will try DAC address and release it into 
> rcache. After the traffic recovery, and SAC address space recovery from 
> fragmentation, but due to the existence of several DAC PFN on the cache 
> of a certain CPU,  which will broken the usage of rcache for SAC address 
> application. As a result, kernel will enter alloc_iova() slow path 
> frequencely and performance is poor. So I think this issue should be not 
> too much of a broken hardware or driver,  but they maybe can also be 
> optimized.

My point is that your system *isn't* broken, because the DAC addresses 
work fine, so there's really no need for it to spend any time messing 
about with SAC constraints at all. Please try my patch[1].

Thanks,
Robin.

[1] 
https://lore.kernel.org/linux-iommu/2b0ca6254dd0102bf559b2a73e9b51da089afbe3.1663764627.git.robin.murphy@arm.com/

> 
>>
>> Since this mechanism now officially serves as a bandage for broken 
>> hardware/drivers and little else, I had an idea to make it a lot 
>> better, guess it's time to have a go at writing that patch up...
>>
>> Thanks,
>> Robin.
>>
>>> Signed-off-by: Chunguang Xu <chunguang.xu@...pee.com>
>>> ---
>>>   drivers/iommu/iova.c | 28 +++++++++++++++++++++-------
>>>   1 file changed, 21 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 47d1983dfa2a..d5775719a143 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -16,6 +16,7 @@
>>>   #define IOVA_ANCHOR    ~0UL
>>>   #define IOVA_RANGE_CACHE_MAX_SIZE 6    /* log of max cached IOVA 
>>> range size (in pages) */
>>> +#define IOVA_RANGE_CACHE_ARRAY_SIZE (2 * IOVA_RANGE_CACHE_MAX_SIZE)
>>>   static bool iova_rcache_insert(struct iova_domain *iovad,
>>>                      unsigned long pfn,
>>> @@ -723,13 +724,13 @@ int iova_domain_init_rcaches(struct iova_domain 
>>> *iovad)
>>>       unsigned int cpu;
>>>       int i, ret;
>>> -    iovad->rcaches = kcalloc(IOVA_RANGE_CACHE_MAX_SIZE,
>>> +    iovad->rcaches = kcalloc(IOVA_RANGE_CACHE_ARRAY_SIZE,
>>>                    sizeof(struct iova_rcache),
>>>                    GFP_KERNEL);
>>>       if (!iovad->rcaches)
>>>           return -ENOMEM;
>>> -    for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>>> +    for (i = 0; i < IOVA_RANGE_CACHE_ARRAY_SIZE; ++i) {
>>>           struct iova_cpu_rcache *cpu_rcache;
>>>           struct iova_rcache *rcache;
>>> @@ -825,11 +826,15 @@ static bool iova_rcache_insert(struct 
>>> iova_domain *iovad, unsigned long pfn,
>>>                      unsigned long size)
>>>   {
>>>       unsigned int log_size = order_base_2(size);
>>> +    unsigned int index;
>>>       if (log_size >= IOVA_RANGE_CACHE_MAX_SIZE)
>>>           return false;
>>> -    return __iova_rcache_insert(iovad, &iovad->rcaches[log_size], pfn);
>>> +    if (pfn > DMA_BIT_MASK(32))
>>> +        index = log_size + IOVA_RANGE_CACHE_MAX_SIZE;
>>> +
>>> +    return __iova_rcache_insert(iovad, &iovad->rcaches[index], pfn);
>>>   }
>>>   /*
>>> @@ -881,11 +886,20 @@ static unsigned long iova_rcache_get(struct 
>>> iova_domain *iovad,
>>>                        unsigned long limit_pfn)
>>>   {
>>>       unsigned int log_size = order_base_2(size);
>>> +    unsigned long iova_pfn;
>>> +    unsigned int index;
>>>       if (log_size >= IOVA_RANGE_CACHE_MAX_SIZE || !iovad->rcaches)
>>>           return 0;
>>> -    return __iova_rcache_get(&iovad->rcaches[log_size], limit_pfn - 
>>> size);
>>> +    iova_pfn = __iova_rcache_get(&iovad->rcaches[log_size], 
>>> limit_pfn - size);
>>> +
>>> +    if (!iova_pfn && limit_pfn > DMA_BIT_MASK(32)) {
>>> +        index = log_size + IOVA_RANGE_CACHE_MAX_SIZE;
>>> +        iova_pfn = __iova_rcache_get(&iovad->rcaches[index], 
>>> limit_pfn - size);
>>> +    }
>>> +
>>> +    return iova_pfn
>>>   }
>>>   /*
>>> @@ -898,7 +912,7 @@ static void free_iova_rcaches(struct iova_domain 
>>> *iovad)
>>>       unsigned int cpu;
>>>       int i, j;
>>> -    for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>>> +    for (i = 0; i < IOVA_RANGE_CACHE_ARRAY_SIZE; ++i) {
>>>           rcache = &iovad->rcaches[i];
>>>           if (!rcache->cpu_rcaches)
>>>               break;
>>> @@ -926,7 +940,7 @@ static void free_cpu_cached_iovas(unsigned int 
>>> cpu, struct iova_domain *iovad)
>>>       unsigned long flags;
>>>       int i;
>>> -    for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>>> +    for (i = 0; i < IOVA_RANGE_CACHE_ARRAY_SIZE; ++i) {
>>>           rcache = &iovad->rcaches[i];
>>>           cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu);
>>>           spin_lock_irqsave(&cpu_rcache->lock, flags);
>>> @@ -945,7 +959,7 @@ static void free_global_cached_iovas(struct 
>>> iova_domain *iovad)
>>>       unsigned long flags;
>>>       int i, j;
>>> -    for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>>> +    for (i = 0; i < IOVA_RANGE_CACHE_ARRAY_SIZE; ++i) {
>>>           rcache = &iovad->rcaches[i];
>>>           spin_lock_irqsave(&rcache->lock, flags);
>>>           for (j = 0; j < rcache->depot_size; ++j) {

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ