lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <582AC24E.8090909@samsung.com>
Date:   Tue, 15 Nov 2016 17:07:42 +0900
From:   Jaewon Kim <jaewon31.kim@...sung.com>
To:     Brian Starkey <brian.starkey@....com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        jaewon31.kim@...sung.com
Subject: Re: [PATCH] [RFC] drivers: dma-coherent: use MEMREMAP_WB instead of
 MEMREMAP_WC


Hi Brian please look into my new approach.
I may need to change other part to use this patch but
I want to get your comment for dma_alloc_from_coherent.

[PATCH] [RFC] drivers: dma-coherent: pass struct dma_attrs to
 dma_alloc_from_coherent

dma_alloc_from_coherent does not get struct dma_attrs information.
If dma_attrs information is passed to dma_alloc_from_coherent,
dma_alloc_from_coherent can do more jobs accodring to the information.
As a example I added DMA_ATTR_SKIP_ZEROING to skip zeroing. Accoring
to driver implementation ZEROING could be skipped or could be done later.

Signed-off-by: Jaewon Kim <jaewon31.kim@...sung.com>
---
 drivers/base/dma-coherent.c | 6 +++++-
 include/linux/dma-mapping.h | 7 ++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c
index 640a7e6..428eced 100644
--- a/drivers/base/dma-coherent.c
+++ b/drivers/base/dma-coherent.c
@@ -151,6 +151,7 @@ void *dma_mark_declared_memory_occupied(struct device *dev,
  * @dma_handle:    This will be filled with the correct dma handle
  * @ret:    This pointer will be filled with the virtual address
  *        to allocated area.
+ * @attrs:    dma_attrs to pass additional information
  *
  * This function should be only called from per-arch dma_alloc_coherent()
  * to support allocation from per-device coherent memory pools.
@@ -159,7 +160,8 @@ void *dma_mark_declared_memory_occupied(struct device *dev,
  * generic memory areas, or !0 if dma_alloc_coherent should return @ret.
  */
 int dma_alloc_from_coherent(struct device *dev, ssize_t size,
-                       dma_addr_t *dma_handle, void **ret)
+                       dma_addr_t *dma_handle, void **ret,
+                       struct dma_attrs *attrs)
 {
     struct dma_coherent_mem *mem;
     int order = get_order(size);
@@ -190,6 +192,8 @@ int dma_alloc_from_coherent(struct device *dev, ssize_t size,
     *ret = mem->virt_base + (pageno << PAGE_SHIFT);
     dma_memory_map = (mem->flags & DMA_MEMORY_MAP);
     spin_unlock_irqrestore(&mem->spinlock, flags);
+    if (dma_get_attr(DMA_ATTR_SKIP_ZEROING, attrs))
+        return 1;
     if (dma_memory_map)
         memset(*ret, 0, size);
     else
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 08528af..737fd71 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -151,13 +151,14 @@ static inline int is_device_dma_capable(struct device *dev)
  * Don't use them in device drivers.
  */
 int dma_alloc_from_coherent(struct device *dev, ssize_t size,
-                       dma_addr_t *dma_handle, void **ret);
+                       dma_addr_t *dma_handle, void **ret,
+                       struct dma_attrs *attrs);
 int dma_release_from_coherent(struct device *dev, int order, void *vaddr);
 
 int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma,
                 void *cpu_addr, size_t size, int *ret);
 #else
-#define dma_alloc_from_coherent(dev, size, handle, ret) (0)
+#define dma_alloc_from_coherent(dev, size, handle, ret, attrs) (0)
 #define dma_release_from_coherent(dev, order, vaddr) (0)
 #define dma_mmap_from_coherent(dev, vma, vaddr, order, ret) (0)
 #endif /* CONFIG_HAVE_GENERIC_DMA_COHERENT */
@@ -456,7 +457,7 @@ static inline void *dma_alloc_attrs(struct device *dev, size_t size,
 
     BUG_ON(!ops);
 
-    if (dma_alloc_from_coherent(dev, size, dma_handle, &cpu_addr))
+    if (dma_alloc_from_coherent(dev, size, dma_handle, &cpu_addr, attrs))
         return cpu_addr;
 
     if (!arch_dma_alloc_attrs(&dev, &flag))

Thank you
Jaewon Kim

On 2016년 11월 11일 18:53, Jaewon Kim wrote:
> Hi
>
> On 2016년 11월 10일 18:51, Brian Starkey wrote:
>> Hi Jaewon,
>>
>> On Thu, Nov 10, 2016 at 10:41:43AM +0900, Jaewon Kim wrote:
>>> Hi
>>>
>>> On 2016년 11월 09일 19:23, Brian Starkey wrote:
>>>> Hi,
>>>>
>>>> On Wed, Nov 09, 2016 at 06:47:26PM +0900, Jaewon Kim wrote:
>>>>>
>>>>> On 2016년 11월 09일 18:27, Brian Starkey wrote:
>>>>>> Hi Jaewon,
>>>>>>
>>>>>> On Wed, Nov 09, 2016 at 06:10:09PM +0900, Jaewon Kim wrote:
>>>>>>> Commit 6b03ae0d42bf (drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MA)
>>>>>>> added MEMREMAP_WC for DMA_MEMORY_MAP. If, however, CPU cache can be used on
>>>>>>> DMA_MEMORY_MAP, I think MEMREMAP_WC can be changed to MEMREMAP_WB. On my local
>>>>>>> ARM device, memset in dma_alloc_from_coherent sometimes takes much longer with
>>>>>>> MEMREMAP_WC compared to MEMREMAP_WB.
>>>>>>>
>>>>>>> Test results on AArch64 by allocating 4MB with putting trace_printk right
>>>>>>> before and after memset.
>>>>>>>     MEMREMAP_WC : 11.0ms, 5.7ms, 4.2ms, 4.9ms, 5.4ms, 4.3ms, 3.5ms
>>>>>>>     MEMREMAP_WB : 0.7ms, 0.6ms, 0.6ms, 0.6ms, 0.6ms, 0.5ms, 0.4 ms
>>>>>>>
>>>>>> This doesn't look like a good idea to me. The point of coherent memory
>>>>>> is to have it non-cached, however WB will make writes hit the cache.
>>>>>>
>>>>>> Writing to the cache is of course faster than writing to RAM, but
>>>>>> that's not what we want to do here.
>>>>>>
>>>>>> -Brian
>>>>>>
>>>>> Hi Brian
>>>>>
>>>>> Thank you for your comment.
>>>>> If allocated memory will be used by TZ side, however, I think cacheable
>>>>> also can be used to be fast on memset in dma_alloc_from_coherent.
>>>> Are you trying to share the buffer between the secure and non-secure
>>>> worlds on the CPU? In that case, I don't think caching really helps
>>>> you. I'm not a TZ expert, but I believe the two worlds can never
>>>> share cached data.
>>> I do not want memory sharing between the secure and non-secure worlds.
>>> I just want faster allocation.
>> So now I'm confused. Why did you mention TZ?
>>
>> Could you explain what your use-case for the buffer you are allocating
>> is?
> I wanted to send physically contiguous memory to TZapp size at Linux runtime.
> And during discussion I realized I need to consider more if dma-coherent is proper approach only for TZapp.
> But if another DMA master is joined, l think we can think this issue again.
> Like secure HW device get memory from dma-coherent and it passes to TZapp.
>>> I am not a TZ expert, either. I also think they cannot share cached data.
>>> As far as I know secure world can decide its cache policy with secure
>>> world page table regardless of non-secure world.
>>>> If you want the secure world to see the non-secure world's data, as
>>>> far as I know you will need to clean the cache in the non-secure world
>>>> to make sure the secure world can see it (and vice-versa). I'd expect
>>>> this to remove most of the speed advantage of using WB in the first
>>>> place, except for some possible speedup from more efficient bursting.
>>> Yes I also think non-secure world need to clean the cache before secure world
>>> access the memory region to avoid invalid data issue. But if other software
>>> like Linux driver or hypervisor do the cache cleaning, or engineer confirm,
>>> then we may be able to use MEMREMAP_WB or just skipping memset for faster
>>> memory allocation in dma_alloc_from_coherent.
>> Skipping the memset doesn't sound like a good plan, you'd potentially
>> be leaking information to whatever device receives the memory. And
>> adding WB mappings to an API which is intended to be used for sharing
>> buffers with DMA masters doesn't sound like a good move either.
>>
>>>> If you're sharing the buffer with other DMA masters, regardless of
>>>> secure/non-secure you're not going to want WB mappings.
>>>>
>>> If there is a scenario where another DMA master works on this memory,
>>> an engineer, I think, need to consider cache clean if he/she uses WB.
>> The whole point of dma-coherent memory is for use by DMA masters.
>>
>>>>> How do you think to add another flag to distinguish this case?
>>>> You could look into the streaming DMA API. It will depend on the exact
>>>> implementation, but at some point you're still going to have to pay
>>>> the penalty of syncing the CPU and device.
>>>>
>>>> -Brian
>>>>
>>> I cannot find DMA API and flag for WB. So I am considering additional flag
>>> to meet my request. In my opinion the flag can be either WB or non-zeroing.
>> To me, it sounds like dma-coherent isn't the right tool to achieve
>> what you want, but I'm not clear on exactly what it is you're trying
>> to do (I know you want faster allocations, the point is what for?)
>>
> I actually looking into enum dma_attr which has DMA_ATTR_SKIP_ZEROING.
> If dma_alloc_attrs in arch/arm64/include/asm/dma-mapping.h passes struct dma_attrs *attrs to dma_alloc_from_coherent,
> I think I can do what I want.
>
> Thank you for your comment.
>> -Brian
>>
>>> For case #1 - DMA_MEMORY_MAP_WB
>>> --- a/drivers/base/dma-coherent.c
>>> +++ b/drivers/base/dma-coherent.c
>>> @@ -32,7 +32,9 @@ static bool dma_init_coherent_memory(
>>>        if (!size)
>>>                goto out;
>>>
>>> -       if (flags & DMA_MEMORY_MAP)
>>> +       if (flags & DMA_MEMORY_MAP_WB)
>>> +               mem_base = memremap(phys_addr, size, MEMREMAP_WB);
>>> +       else if (flags & DMA_MEMORY_MAP)
>>>                mem_base = memremap(phys_addr, size, MEMREMAP_WC);
>>>        else
>>>                mem_base = ioremap(phys_addr, size);
>>>
>>> For case #2 - DMA_MEMORY_MAP_NOZEROING
>>> --- a/drivers/base/dma-coherent.c
>>> +++ b/drivers/base/dma-coherent.c
>>> @@ -190,6 +190,8 @@ int dma_alloc_from_coherent(struct device *dev, ssize_t size,
>>>        *ret = mem->virt_base + (pageno << PAGE_SHIFT);
>>>        dma_memory_map = (mem->flags & DMA_MEMORY_MAP);
>>>        spin_unlock_irqrestore(&mem->spinlock, flags);
>>> +       if (mem->flags & DMA_MEMORY_MAP_NOZEROING)
>>> +               return 1;
>>>        if (dma_memory_map)
>>>                memset(*ret, 0, size);
>>>        else
>>>
>>> Can I get your comment?
>>>
>>> Thank you
>>> Jaewon Kim
>>>
>>>>>>> Signed-off-by: Jaewon Kim <jaewon31.kim@...sung.com>
>>>>>>> ---
>>>>>>> drivers/base/dma-coherent.c | 2 +-
>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c
>>>>>>> index 640a7e6..0512a1d 100644
>>>>>>> --- a/drivers/base/dma-coherent.c
>>>>>>> +++ b/drivers/base/dma-coherent.c
>>>>>>> @@ -33,7 +33,7 @@ static bool dma_init_coherent_memory(
>>>>>>>         goto out;
>>>>>>>
>>>>>>>     if (flags & DMA_MEMORY_MAP)
>>>>>>> -        mem_base = memremap(phys_addr, size, MEMREMAP_WC);
>>>>>>> +        mem_base = memremap(phys_addr, size, MEMREMAP_WB);
>>>>>>>     else
>>>>>>>         mem_base = ioremap(phys_addr, size);
>>>>>>>     if (!mem_base)
>>>>>>> -- 
>>>>>>> 1.9.1
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ