linux-kernel - Re: [Linux-c6x-dev] [PATCH 3/9] c6x: Provide dma_mmap_coherent() and dma_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <5100216E.1050104@samsung.com>
Date:	Wed, 23 Jan 2013 18:44:14 +0100
From:	Marek Szyprowski <m.szyprowski@...sung.com>
To:	James Bottomley <James.Bottomley@...senPartnership.com>
Cc:	Geert Uytterhoeven <geert@...ux-m68k.org>,
	Mark Salter <msalter@...hat.com>,
	Vineet Gupta <Vineet.Gupta1@...opsys.com>,
	linux-arch@...r.kernel.org, linux-c6x-dev@...ux-c6x.org,
	linux-kernel@...r.kernel.org
Subject: Re: [Linux-c6x-dev] [PATCH 3/9] c6x: Provide dma_mmap_coherent() and
 dma_get_sgtable()


On 1/23/2013 11:29 AM, James Bottomley wrote:
> On Wed, 2013-01-23 at 10:47 +0100, Marek Szyprowski wrote:
> > On 1/22/2013 11:13 AM, James Bottomley wrote:
> > > There might be a simple solution:  just replace void *cpu_addr with void
> > > **cpu_addr in the API.  This is a bit nasty since compilers think that
> > > void ** to void * conversion is quite legal, so it would be hard to pick
> > > up misuse of this (uint8_t ** would be better).  That way VIPT could
> > > remap the kernel pages to a coherent address.  This would probably have
> > > to change in the dma_mmap_attr() and dma_ops structures.
> > >
> > > All consumers would have to expect cpu_addr might change, but that seems
> > > doable.
> >
> > I still don't get how this can help having a coherent buffer between DMA
> > (devices) and CPU (kernel and user space mappings). The main purpose of
> > the dma_mmap_coherent() call is to provide a common API for mapping a
> > coherent buffer between DMA (device) and userspace. It is not about
> > creating a coherent mapping for sharing memory between userspace and
> > kernel space.
>
> OK, so I assume you don't understand how VIPT architectures work?
> On a VIPT architecture, the CPU cache is indexed by the virtual address
> but tagged by the physical address.  This means that when an address
> comes into the CPU, we can do simultaneous lookups in the cache and the
> TLB (by the virtual address).  The cache doesn't have the full address
> bits of the index, so it usually only looks up the lowest n bits.  The
> value of n gives the congruency of the cache (sometimes called the
> colour of the cache lines).  The cache usually produces a number of
> possible lines depending on its associativity and the TLB lookup
> produces the physical address.  We can now sweep through the cache lines
> and if a physical address tag matches, return the answer from the cache
> instead of having to consult main memory.  This gives a speed advantage
> over PIPT (Physically Indexed Physically Tagged) caches because on PIPT
> the cache lookup can only occur *after* the TLB lookup instead of
> concurrently with.
>
> As an aside, in practise every PIPT CPU actually has a VIPT cache,
> purely because of the speed angle.  The trick to making it all work is
> to shrink n so that n <= PAGE_SIZE_BITS and increase the associativity.
> This means you can get VIPT speed without producing the aliasing
> effects.
>
> Coherence is achieved in VIPT CPUs when two mapping addresses for the
> same physical page, say v1 and v2 are congruent i.e. (v1 & ((1<<n)-1))
> == (v2 & ((1<<n)-1)).  For such mappings, the same cache line is used.
> This is why we have an architecture override for
> arch_get_unmapped_area(). we use this to ensure all shareable mappings
> for all userspace process are at congruent addresses, so we don't get
> any aliasing problems in shared VMAs.
>
> Aliasing happens when v1 & ((1<<n)-1)) != (v2 & ((1<<n)-1)) where v1 and
> v2 are virtual addresses of the same physical page.  Now that page has
> *two* cache copies and you have to be very careful to resolve the
> aliases.  This, unfortunately, is the default scenario for kernel
> virtual and user virtual addresses because all kernel pages are offset
> mapped which is why we have to be annoyingly careful about flushing
> kernel pages before we hand control back to userspace.
>
> As you can see, we can only share a mapping between the kernel and user
> space without all the aliasing problems if the address is congruent.
>
> dma_mmap_coherent() seeks to be coherent from the device to the kernel
> to userspace.  On PARISC, we fix device coherency by loading a coherence
> index into our IOMMU (it effectively tells the IOMMU the virtual alias
> for the physical memory and allows the CPU cache to be managed as part
> of the I/O transaction).  The only way to prevent aliasing between the
> kernel and user space is to make the two addresses congruent.  We
> already have a fixed user address (given to us by the vma), so our only
> choice is to remap the page in the kernel to a congruent address and
> bingo we have a buffer shared between device/kernel/user space with no
> aliasing problems.
>
> The reason for making the cpu_addr mutable is to allow the
> implementation to remap the page and make full congruency happen.  We
> can't allow the kernel and userspace addresses to be non-congruent
> because kmap will fail horribly (it's required by the dma_buf_ops) and
> we also run the risk of getting a machine check if the cache discovers
> two dirty aliases of the same page.

I know what is VIPT, but I didn't know how device coherency is achieved on
PARISC and I wasn't aware that it is not possible to use non-cached 
mappings.
Now I have a complete image of this platform, thanks for you very detailed
description.

Now we need to find some working solution. This case shows that the current
DMA mapping API is quite limited and one day it will need complete rewrite.

Probably the best approach would be to change the API completely and
introduce some kind of DMA object with the following usage pattern:

dma_obj *buf = dma_alloc(dev, size, DMA_COHERENT | DMA_USERSPACE | 
DMA_KERNELSPACE, gfp);
if (!buf) {
     dma_addr_t dma_addr = dma_get_device_addr(buf);
     void *virt_addr = dma_get_kernel_addr(buf);
     ...

     dma_mmap_buf(buf, vma);
}

This way one can specify if kernel mapping or userspace mapping is required
for the given buffer or not.

This however requires a lot of work and update in all existing drivers. 
I doubt
that this can be done quickly without some period of transition and 
emulation of
the old api on top of the new one. It will also need a lot of discussion 
between
core kernel developers and a good proposal. I cannot promise that I will 
handle
this in any reasonable time frame now.

In meantime we need to live with the current design of dma_alloc_coherent do
something with dma_mmap_coherent. You have mentioned cpu_addr update in
dma_mmap_coherent, but frankly I don't like the '**cpu_addr' approach, as it
looks very hacky and it will end in various misusage in the drivers. API 
should
be clear and simple, but a function which updates the point clearly doesn't
meet this criteria.

I wonder if there will be any clients for the dma_mmap_* call on PARISC
(and other architectures which currently lack good way for mapping coherent
buffers to userspace). Currently there is no such driver. Maybe we can stick
with dummy implementation, which returns -EINVAL for now?

I think that the complete API redesign will be required one day. There 
are too
many limitations of the current API, which even now results in hacks and
workarounds in the drivers. We tried to avoid hacks in the drivers and 
update
the dma-mapping API, but it looks that this was only a top of the iceberg...

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/