lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKv+Gu_UZmq=6juE98ceXgJ1KCgvzncYbzYiC2Uz+6mhf6yMpw@mail.gmail.com>
Date:   Mon, 6 Aug 2018 14:53:07 +0200
From:   Ard Biesheuvel <ard.biesheuvel@...aro.org>
To:     Robin Murphy <robin.murphy@....com>
Cc:     Mikulas Patocka <mpatocka@...hat.com>,
        Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>,
        Joao Pinto <Joao.Pinto@...opsys.com>,
        linux-pci <linux-pci@...r.kernel.org>,
        Jingoo Han <jingoohan1@...il.com>,
        Will Deacon <will.deacon@....com>,
        Russell King <linux@...linux.org.uk>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Matt Sealey <neko@...uhatsu.net>,
        Catalin Marinas <catalin.marinas@....com>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>
Subject: Re: framebuffer corruption due to overlapping stp instructions on arm64

On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@....com> wrote:
> On 06/08/18 11:25, Mikulas Patocka wrote:
> [...]
>>>
>>> None of this explains why some transactions fail to make it across
>>> entirely. The overlapping writes in question write the same data to
>>> the memory locations that are covered by both, and so the ordering in
>>> which the transactions are received should not affect the outcome.
>>
>>
>> You're right that the corruption couldn't be explained just by reordering
>> writes. My hypothesis is that the PCIe controller tries to disambiguate
>> the overlapping writes, but the disambiguation logic was not tested and it
>> is buggy. If there's a barrier between the overlapping writes, the PCIe
>> controller won't see any overlapping writes, so it won't trigger the
>> faulty disambiguation logic and it works.
>>
>> Could the ARM engineers look if there's some chicken bit in Cortex-A72
>> that could insert barriers between non-cached writes automatically?
>
>
> I don't think there is, and even if there was I imagine it would have a
> pretty hideous effect on non-coherent DMA buffers and the various other
> places in which we have Normal-NC mappings of actual system RAM.
>
>> I observe these kinds of corruptions:
>> - failing to write a few bytes
>
>
> That could potentially be explained by the reordering/atomicity issues Matt
> mentioned, i.e. the load is observing part of the store, before the store
> has fully completed.
>

OK, so that means the unaligned transaction gets split, and the
subtransactions are reordered with the aligned transaction so that the
sub-writes contain stale values from the sub-reads?

>> - writing a few bytes that were written 16 bytes before
>> - writing a few bytes that were written 16 bytes after
>
>
> Those sound more like the interconnect or root complex ignoring the byte
> strobes on an unaligned burst, of which I think the simplistic view would be
> "it's broken".
>
> FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x
> Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and
> it's still happily flickering pixels in the corner of the console after
> nearly an hour (in parallel with some iperf3 just to ensure plenty of PCIe
> traffic). I would strongly suspect this issue is particular to Armada 8k, so
> its' probably one for the Marvell folks to take a closer look at - I believe
> some previous interconnect issues on those SoCs were actually fixable in
> firmware.
>

IIRC that was DVM dropping a few VA bits at the top, and a single MMIO
control bit to put it back into 'non-broken' mode.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ