linux-kernel - Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yl5qMXCpwmxho/FN@Red>
Date:   Tue, 19 Apr 2022 09:52:17 +0200
From:   Corentin Labbe <clabbe.montjoie@...il.com>
To:     Philipp Tomsich <philipp.tomsich@...ll.eu>
Cc:     Guo Ren <guoren@...nel.org>, Samuel Holland <samuel@...lland.org>,
        Heiko Stuebner <heiko@...ech.de>,
        Palmer Dabbelt <palmer@...belt.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        linux-riscv <linux-riscv@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Wei Fu <wefu@...hat.com>, Atish Patra <atishp@...shpatra.org>,
        Anup Patel <anup@...infault.org>,
        Nick Kossifidis <mick@....forth.gr>,
        Christoph Muellner <cmuellner@...ux.com>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        linux-crypto@...r.kernel.org
Subject: Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the
 t-head variant

Le Mon, Apr 18, 2022 at 05:29:10PM +0200, Philipp Tomsich a écrit :
> On Sun, 17 Apr 2022 at 19:35, Corentin Labbe <clabbe.montjoie@...il.com> wrote:
> >
> > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> > > <clabbe.montjoie@...il.com> wrote:
> > > >
> > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > > > <clabbe.montjoie@...il.com> wrote:
> > > > > >
> > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > > > >>>>
> > > > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > > > >>>>
> > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > > > >>>> different set of cache instructions. But while they are different,
> > > > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>> Hello
> > > > > > > >>>
> > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > > > >>>
> > > > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > > > >>>
> > > > > > > >>> The following small code show the problem:
> > > > > > > >>>
> > > > > > > >>> dma_addr_t dma;
> > > > > > > >>> u8 *buf;
> > > > > > > >>> #define BSIZE 2048
> > > > > > > >>> #define DMASIZE 16
> > > > > > > >>>
> > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > > > >>>     buf[i] = 0xFE;
> > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > > > >>
> > > > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > > > >
> > > > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > > > >
> > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > > > across the SoCs, so this is not too surprising.
> > > > > > >
> > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Samuel
> > > > > >
> > > > > > +CC crypto mailing list + maintainer
> > > > > >
> > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > > > concat a poison buffer to check that device does write beyond buffer.
> > > > > >
> > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > > > >
> > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > > > >
> > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > > > for the testing. (All cache block-aligned data from the device for the
> > > > > CPU should be invalided.)
> > > > >
> > > >
> > > > With:
> > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > > > index 2c124bcc1932..608483522e05 100644
> > > > --- a/arch/riscv/mm/dma-noncoherent.c
> > > > +++ b/arch/riscv/mm/dma-noncoherent.c
> > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> > > >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > > >                 break;
> > > >         case DMA_FROM_DEVICE:
> > > > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > > > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > > >                 break;
> > > >         case DMA_BIDIRECTIONAL:
> > > >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > > >
> > > >
> > > > The crypto self test works and I got no more buffer corruption.
> > > No, No ... it's not a solution. That means your driver has a problem.
> > > From device, we only need INVAL enough.
> > >
> >
> > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.
> >
> > Example:
> > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
> > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).
> >
> > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.
> 
> Doesn't the DMA_FROM_DEVICE indicate that there are no expected writes
> from the CPU to the buffer (and that any modifications to the
> underlying cache line can be dropped via an invalidation)?
> In other words: does the behavior change when mapping as
> DMA_BIDIRECTIONAL — and: should a map/unmap sequence be used where it
> is first mapped as DMA_TO_DEVICE when poisoning the buffer and later
> as DMA_FROM_DEVICE when in normal operation?
> 

There are no cpu writes after the dma_map(FROM_DEVICE).
The buffer is initialized by the cryptoAPI before.
Furtheremore, the buffer corrupted is next to the buffer being mapped.

I verified the size of dma_map_sg() via some debug:
sun8i-ce 3040000.crypto: sun8i_ce_cipher_prepare ecb(aes) cryptlen=16
dma_direct_map_sg:483 SG0 len=16   <- dma_map TO_DEVICE
dma_direct_map_sg:483 SG0 len=16   <- dma_map FROM_DEVICE
need:a47ca9dd e0df4c86 a070af6e 91710dec 
have:a47ca9dd e0df4c86 a070af6e 91710dec
dump whole buffer:
over:a47ca9dd e0df4c86 a070af6e 91710dec
over:ec05e6f2 d542fb77 128b2059 5bf06986 < here we should have 0xFE
alg: skcipher: ecb-aes-sun8i-ce encryption overran dst buffer on test vector 1, cfg=\"random: use_finup src_divs=[<reimport>100.0%@...04]\"


Note that I tried the following patch:
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 4948201065cc..c5b945974441 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -19,6 +19,7 @@
 #include <crypto/aead.h>
 #include <crypto/hash.h>
 #include <crypto/skcipher.h>
+#include <linux/cacheflush.h>
 #include <linux/err.h>
 #include <linux/fips.h>
 #include <linux/module.h>
@@ -205,6 +206,7 @@ static void testmgr_free_buf(char *buf[XBUFSIZE])
 static inline void testmgr_poison(void *addr, size_t len)
 {
        memset(addr, TESTMGR_POISON_BYTE, len);
+       flush_icache_range(addr, addr + len);
 }
 
 /* Is the memory region still fully poisoned? */

This patch fixes the problem, but I am not sure this is the rigth way.
A DMA mapping operation corrupting buffer around seems not good.