lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 2 Aug 2021 17:09:35 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     John Garry <john.garry@...wei.com>, Will Deacon <will@...nel.org>
Cc:     linux-kernel@...r.kernel.org, sakari.ailus@...ux.intel.com,
        mst@...hat.com, airlied@...ux.ie, gregkh@...uxfoundation.org,
        linuxarm@...wei.com, jonathanh@...dia.com,
        iommu@...ts.linux-foundation.org, thierry.reding@...il.com,
        daniel@...ll.ch, bingbu.cao@...el.com, digetx@...il.com,
        mchehab@...nel.org, jasowang@...hat.com, tian.shu.qiu@...el.com
Subject: Re: [PATCH v4 2/6] iova: Allow rcache range upper limit to be
 flexible

On 2021-08-02 16:23, John Garry wrote:
> On 02/08/2021 16:01, Will Deacon wrote:
>> On Wed, Jul 14, 2021 at 06:36:39PM +0800, John Garry wrote:
>>> Some LLDs may request DMA mappings whose IOVA length exceeds that of the
>>> current rcache upper limit.
>>
>> What's an LLD?
>>
> 
> low-level driver
> 
> maybe I'll stick with simply "drivers"
> 
>>> This means that allocations for those IOVAs will never be cached, and
>>> always must be allocated and freed from the RB tree per DMA mapping 
>>> cycle.
>>> This has a significant effect on performance, more so since commit
>>> 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search
>>> fails"), as discussed at [0].
>>>
>>> As a first step towards allowing the rcache range upper limit be
>>> configured, hold this value in the IOVA rcache structure, and allocate
>>> the rcaches separately.
>>>
>>> [0] 
>>> https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/ 
>>>
>>>
>>> Signed-off-by: John Garry <john.garry@...wei.com>
>>> ---
>>>   drivers/iommu/dma-iommu.c |  2 +-
>>>   drivers/iommu/iova.c      | 23 +++++++++++++++++------
>>>   include/linux/iova.h      |  4 ++--
>>>   3 files changed, 20 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 98ba927aee1a..4772278aa5da 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -434,7 +434,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
>>> iommu_domain *domain,
>>>        * rounding up anything cacheable to make sure that can't 
>>> happen. The
>>>        * order of the unadjusted size will still match upon freeing.
>>>        */
>>> -    if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>>> +    if (iova_len < (1 << (iovad->rcache_max_size - 1)))
>>>           iova_len = roundup_pow_of_two(iova_len);
>>>       dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index b6cf5f16123b..07ce73fdd8c1 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -15,6 +15,8 @@
>>>   /* The anchor node sits above the top of the usable address space */
>>>   #define IOVA_ANCHOR    ~0UL
>>> +#define IOVA_RANGE_CACHE_MAX_SIZE 6    /* log of max cached IOVA 
>>> range size (in pages) */
>>
>> Is that the same as an 'order'? i.e. IOVA_RANGE_CACHE_MAX_ORDER?
> 
> Yeah, that may be better. I was just using the same name as before.
> 
>>
>>> +
>>>   static bool iova_rcache_insert(struct iova_domain *iovad,
>>>                      unsigned long pfn,
>>>                      unsigned long size);
>>> @@ -881,7 +883,14 @@ static void init_iova_rcaches(struct iova_domain 
>>> *iovad)
>>>       unsigned int cpu;
>>>       int i;
>>> -    for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>>> +    iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE;
>>> +
>>> +    iovad->rcaches = kcalloc(iovad->rcache_max_size,
>>> +                 sizeof(*iovad->rcaches), GFP_KERNEL);
>>> +    if (!iovad->rcaches)
>>> +        return;
>>
>> Returning quietly here doesn't seem like the right thing to do. At 
>> least, I
>> don't think the rest of the functions here are checking rcaches against
>> NULL.
>>
> 
> For sure, but that is what other code which can fail here already does, 
> like:
> 
> static void init_iova_rcaches(struct iova_domain *iovad)
> {
>      ...
> 
>      for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>          ...
> 
>          rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), 
> cache_line_size());
>          if (WARN_ON(!rcache->cpu_rcaches))
>              continue;
> }
> 
> and that is not safe either.

Yeah, along with flush queues, historically this has all been 
super-dodgy in terms of failure handling (or lack of).

> This issue was raised a while ago. I don't mind trying to fix it - a 
> slightly painful part is that it touches a few subsystems.

Maybe pull the rcache init out of iova_domain_init() entirely? Only 
iommu-dma uses {alloc,free}_iova_fast(), so TBH it's only a great big 
waste of memory for all the other IOVA domain users anyway.

The other week I started pondering how much of iommu-dma only needs to 
be exposed to the IOMMU core rather than the whole kernel now; I suppose 
there's probably an equal argument to be made for some of these bits of 
the IOVA API, and this might pave the way towards some more logical 
separation, but let's get the functional side dealt with before we worry 
too much about splitting headers.

Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ