linux-kernel - Re: [PATCH 4/7] RISC-V: arch/riscv/include

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mhng-ecf6d8c3-20c8-4196-bb9c-0c7409918ec6@palmer-si-x1c4>
Date:   Fri, 02 Jun 2017 19:00:00 -0700 (PDT)
From:   Palmer Dabbelt <palmer@...belt.com>
To:     benh@...nel.crashing.org
CC:     Arnd Bergmann <arnd@...db.de>
Subject:     Re: [PATCH 4/7] RISC-V: arch/riscv/include

On Tue, 23 May 2017 14:23:50 PDT (-0700), benh@...nel.crashing.org wrote:
> On Tue, 2017-05-23 at 14:55 +0200, Arnd Bergmann wrote:
>> > +
>> > +#include <asm-generic/io.h>
>>
>> I would recommend providing your own {read,write}{b,w,l,q}{,_relaxed}
>> helpers using inline assembly, to prevent the compiler for breaking
>> up accesses into byte accesses.
>>
>> Also, most architectures require to some synchronization after a
>> non-relaxed readl() to prevent prefetching of DMA buffers, and
>> before a writel() to flush write buffers when a DMA gets triggered.
>
> Right, I was about to comment on that one.

Well, you're both correct: what was there just isn't correct.  Our
implementations were safe because they don't have aggressive MMIO systems, but
that won't remain true for long.

I've gone ahead and added a proper IO implementation patterned on arm64.  It'll
be part of the v2 patch set.  Here's the bulk of the patch, if you're curious

  https://github.com/riscv/riscv-linux/commit/e200fa29a69451ef4d575076e4d2af6b7877b1fa

> The question Palmer is about the ordering semantics of non-cached
> storage.
>
> What kind of ordering is provided architecturally ? Especially
> between cachable and non-cachable loads and stores ?

The memory model on RISC-V is pretty weak.  Without fences there are no
ordering constraints.  We provide 2 "ordering spaces" (an odd name I just made
up, there's one address space): the IO space and the regular memory space.  The
base RISC-V ordering primitive is a fence, which takes a predecessor set and
successor set.  Fences can look like

  fence IORW,IORW

where the left side is the predecessor set and the right side is the successor
set.  The fence enforces ordering between any operations in the two sets: all
operations in the predecessor set must be globally visible before any operation
in the successor set becomes visible anywhere.

For example, if you're emitting a DMA transaction you'd have to do something
like

  build_message_in_memory()
  fence w,o
  set_control_register()

with the fence ensuring all the memory writes are visible before the control
register write.

More information can be found in the ISA manuals

  https://github.com/riscv/riscv-isa-manual/releases/download/riscv-user-2.2/riscv-spec-v2.2.pdf
  https://github.com/riscv/riscv-isa-manual/releases/download/riscv-priv-1.10/riscv-privileged-v1.10.pdf

> Also, you have PCIe right ? What is the behaviour of MSIs ?
>
> Does your HW provide a guarantee that in the case of a series of DMA
> writes to memory by a device followed by an MSI, the CPU getting the
> MSI will only get it after all the previous DMA writes have reached
> coherency ? (Unlike LSIs where the driver is required to do an MMIO
> read from the device, MSIs are expected to be ordered with data).

We do have PCIe, but I'm not particularly familiar with it as I haven't spent
any time hacking on our PCIe hardware or driver.  My understanding here is that
PCIe defines that MSIs must not be reordered before the DMA writes, so the
implementation is required to enforce this ordering.  Thus it's a problem for
the PCIe controller implementation (which isn't covered by RISC-V) and therefor
doesn't need any ordering enforced by the driver.

If I'm correct in the assumption that the hardware is required to enforce these
ordering constraints then I think this isn't a RISC-V issue.  I'll go bug our
PCIe guys to make sure everything is kosher in that case, but it's an ordering
constraint that is possible to enforce in our coherence protocol so it's not a
fundamental problem (and just an implementation one at that).

If the hardware isn't required to enforce the ordering, then we'll need a fence
before handling the data.

> Another things with the read*() accessors. It's not uncommon for
> a driver to do:
>
> 	writel(1, reset_reg);
> 	readl(reset_reg); /* flush posted writes */
> 	udelay(10);
> 	writel(0, reset_reg);
>
> Now, in the above case, what can typically happen if you aren't careful
> is that the readl which is intended to "push" the previous writel, will
> not actually do its job because the return value hasn't been "consumed"
> by the processor. Thus, the CPU will stick that on some kind of load
> queue and won't actually wait for the return value before hitting the
> delay loop.
>
> Thus you might end up in a situation where the writel of 1 to the
> device is itself reaching the device way after you started the delay
> loop, and thus end up violating the delay requirement of the HW.
>
> On powerpc we solve that by using a special instruction construct
> inside the read* accessors that prevents the CPU from executing
> subsequent instructions until the read value has been returned.
>
> You may want to consider something similar.

Ooh, that's a fun one :).   I bugged Andrew (the ISA wizard), and this might
require some clarification in the ISA manual.  The code we emit will look
something like

  fence io,o
  st 1, RESET_REG
  ld RESET_REG
  fence o,io
loop:
  rdtime
  blt loop
  fence io,o
  st 0, RESET_REG

Since the fences just enforce ordering between the loads and stores, there's
nothing that prevents the processor from releasing the store before the delay
loop completes.  I think we might be safe here because you're not allowed to
make speculative writes visible, but arguably that's not a speculative write
because you can predict the timer will keep increasing.  The distinction is
somewhat academic, though: I'm not sure what purpose that very specific sort of
predictor would have aside from breaking existing code.

This issue of "how is time ordered" has come up a handful of times, most of
which don't have the loop.  The idea of adding the result rdtime instruction to
the IO space.  I've opened a spec bug

  https://github.com/riscv/riscv-isa-manual/issues/78

Thanks!