lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 15 May 2019 21:37:59 +0200
From:   Arnd Bergmann <>
To:     Alex Elder <>
Cc:     David Miller <>,
        Bjorn Andersson <>,
        Ilias Apalodimas <>,,,, Ben Chan <>,
        Eric Caruso <>,,
        Linux Kernel Mailing List <>
Subject: Re: [PATCH 08/18] soc: qcom: ipa: the generic software interface

On Sun, May 12, 2019 at 3:25 AM Alex Elder <> wrote:

> +static int gsi_ring_alloc(struct gsi *gsi, struct gsi_ring *ring, u32 count)
> +{
> +       size_t size = roundup_pow_of_two(count * sizeof(struct gsi_tre));
> +       dma_addr_t addr;
> +
> +       /* Hardware requires a power-of-2 ring size (and alignment) */
> +       ring->virt = dma_alloc_coherent(gsi->dev, size, &addr, GFP_KERNEL);
> +       if (!ring->virt)
> +               return -ENOMEM;
> +       ring->addr = addr;
> +       ring->base = addr & GENMASK(31, 0);
> +       ring->size = size;
> +       ring->end = ring->base + size;
> +       spin_lock_init(&ring->spinlock);
> +
> +       return 0;
> +}

Another comment for this patch: dma_alloc_coherent() does not guarantee
alignment of the requested buffer as implied by the comment. In many
configurations, it /is/ naturally aligned because the buffer comes from
alloc_pages(), but you can't really be sure.

I suspect it's actually only broken when the buffer spans a 4GB boundary
(and updating the lower 32 bit in the register gives a wrong pointer), which
is unlikely but will happen at some point according to Murphy's law.
If you just need the dma_addr_t to not cross a 4GB boundary, the
easiest solution would be to use GFP_DMA32, which gives you a
buffer that is mapped to the first 4GB bus address space (not necessarily
the first 4GB of RAM if you have an iommu).

If you manually align the ring buffer, it should be fine too, though I have
to say that the way the driver does pointer arithmetic on 32-bit integers
seems rather fragile as well.

A nicer way to deal with ring buffers in general is to only ever use a
32-bit index number stored in an atomic_t, use atomic_inc_return()
to advance the index and then mask the number when turning it into
an index. With that, you should also be able to avoid the shared
spinlock. Moving the rp and wp into separate cache lines further
reduces the coherency traffic by avoiding concurrent writes on the
same line.


Powered by blists - more mailing lists