[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA9_cmfT1R2+r9_8g0giMJz+wvrNce6T6GPDpvUTtcA-UyAZNQ@mail.gmail.com>
Date: Tue, 4 Sep 2012 18:19:45 -0700
From: Dan Williams <djbw@...com>
To: Liu Qiang-B32616 <B32616@...escale.com>
Cc: "linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
"herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
"davem@...emloft.net" <davem@...emloft.net>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
Li Yang-R58472 <r58472@...escale.com>,
Phillips Kim-R1AAHA <R1AAHA@...escale.com>,
"vinod.koul@...el.com" <vinod.koul@...el.com>,
"arnd@...db.de" <arnd@...db.de>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
Dave Jiang <dave.jiang@...il.com>
Subject: Re: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
On Tue, Sep 4, 2012 at 5:28 AM, Liu Qiang-B32616 <B32616@...escale.com> wrote:
>> Will this engine be coordinating with another to handle memory copies?
>> The dma mapping code for async_tx/raid is broken when dma mapping
>> requests overlap or cross dma device boundaries [1].
>>
>> [1]: http://marc.info/?l=linux-arm-kernel&m=129407269402930&w=2
> Yes, it needs fsl-dma to handle memcpy copies.
> I read your link, the unmap address is stored in talitos hwdesc, the address will be unmapped when async_tx ack this descriptor, I know fsl-dma won't wait this ack flag in current kernel, so I fix it in fsl-dma patch 5/8. Do you mean that?
Unfortunately no. I'm open to other suggestions. but as far as I can
see it requires deeper changes to rip out the dma mapping that happens
in async_tx and the automatic unmapping done by drivers. It should
all be pushed to the client (md).
Currently async_tx hides hardware details from md such that it doesn't
even care if the operation is offloaded to hardware at all, but that
takes things too far. In the worst case an copy->xor chain handled by
multiple channels results in :
1/ dma_map(copy_chan...)
2/ dma_map(xor_chan...)
3/ <exec copy>
4/ dma_unmap(copy_chan...)
5/ <exec xor> <---initiated by the copy_chan
6/ dma_unmap(xor_chan...)
Step 2 violates the dma api since the buffers belong to the xor_chan
until unmap. Step 5 also causes the random completion context of the
copy channel to bleed into submission context of the xor channel which
is problematic. So the order needs to be:
1/ dma_map(copy_chan...)
2/ <exec copy>
3/ dma_unmap(copy_chan...)
4/ dma_map(xor_chan...)
5/ <exec xor> <--initiated by md in a static context
6/ dma_unmap(xor_chan...)
Also, if xor_chan and copy_chan lie with the same dma mapping domain
(iommu or parent device) then we can map the stripe once and skip the
extra maintenance for the duration of the chain of operations. This
dumps a lot of hardware details on md, but I think it is the only way
to get consistent semantics when arbitrary offload devices are
involved.
--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists