lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAJF2gTSLAL8v+hQ2Lmu4jLj_kx4vTJ1oh==EBQuFR+L6hx=E2Q@mail.gmail.com>
Date:   Fri, 20 Sep 2019 08:13:02 +0800
From:   Guo Ren <guoren@...nel.org>
To:     Andrew Waterman <andrew@...ive.com>
Cc:     Anup Patel <Anup.Patel@....com>, Atish Patra <Atish.Patra@....com>,
        Guo Ren <ren_guo@...ky.com>, benh@...nel.crashing.org,
        Catalin Marinas <catalin.marinas@....com>, dwmw2@...radead.org,
        feiteng_li@...ky.com, gary@...yguo.net,
        iommu@...ts.linux-foundation.org, jacob.jun.pan@...ux.intel.com,
        Jean-Philippe Brucker <jean-philippe@...aro.org>,
        Julien Grall <julien.grall@....com>, julien.thierry@....com,
        kvmarm@...ts.cs.columbia.edu, linux-arm-kernel@...ts.infradead.org,
        linux-csky@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-riscv@...ts.infradead.org,
        Marc Zyngier <marc.zyngier@....com>,
        Palmer Dabbelt <palmer@...ive.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        tech-privileged@...ts.riscv.org, Will Deacon <will.deacon@....com>
Subject: Re: [tech-privileged] [RFC PATCH V1] riscv-privileged: Add broadcast
 mode to sfence.vma

Hi,

On Fri, Sep 20, 2019 at 12:10 AM Andrew Waterman <andrew@...ive.com> wrote:
>
> This needs to be discussed and debated at length; proposing edits to the spec at this stage is putting the cart before the horse!
Agree :)

>
> We shouldn’t change the definition of the existing SFENCE.VMA instruction to accomplish this. It’s also not abundantly clear to me that this should be an instruction:
If you implement sfence.vma as current define, it also could work with
new mechanism, they are compatible.

> TLB shootdown looks more like MMIO.
Per-CPU MMIO ? I the proposal, every hart only takes care of its own request.




>
> On Thu, Sep 19, 2019 at 5:36 AM Guo Ren <guoren@...nel.org> wrote:
>>
>> From: Guo Ren <ren_guo@...ky.com>
>>
>> The patch is for https://github.com/riscv/riscv-isa-manual
>>
>> The proposal has been talked in LPC-2019 RISC-V MC ref [1]. Here is the
>> formal patch.
>>
>> Introduction
>> ============
>>
>> Using the Hardware TLB broadcast invalidation instruction to maintain the
>> system TLB is a good choice and it'll simplify the system software design.
>> The proposal hopes to add a broadcast mode to the sfence.vma in the
>> riscv-privilege specification. To support the sfence.vma broadcast mode,
>> there are two modification introduced below:
>>
>>  1) Add PGD.PPN (root page table's PPN) as the unique identifier of the
>>     address space in addition to asid/vmid. Compared to the dynamically
>>     changed asid/vmid, PGD.PPN is fixed throughout the address space life
>>     cycle. This feature enables uniform address space identification
>>     between different TLB systems (actually, it's difficult to unify the
>>     asid/vmid between the CPU system and the IOMMU system, because their
>>     mechanisms are different)
>>
>>  2) Modify the definition of the sfence.vma instruction from synchronous
>>     mode to asynchronous mode, which means that the completion of the TLB
>>     operation is not guaranteed when the sfence.vma instruction retires.
>>     It needs to be completed by checking the flag bit on the hart. The
>>     sfence.vma request finish can notify the software by generating an
>>     interrupt. This function alleviates the large delay of TLB invalidation
>>     in the PCI ATS system.
>>
>> Add S1/S2.PGD.PPN for ASID/VMID
>> ===============================
>>
>> PGD is global directory (defined in linux) and PPN is page physical number
>> (defined in riscv-spec). PGD.PNN corresponds to the root page table pointer
>> of the address space, i.e. mm->pgd (linux concept).
>>
>> In CPU/IOMMU TLB, we use asid/vmid to distinguish the address space of
>> process or virtual machine. Due to the limitation of id encoding, it can
>> only represent a part(window) of the address space. S1/S2.PGD.PPN are the
>> root page table's PPNs of the address spaces and S1/S2.PGD.PPN are the
>> unique identifier of the address spaces.
>>
>> For the CPU SMP system, you can use context switch to perform the necessary
>> software mechanism to ensure that the asid/vmid on all harts is consistent
>> (please refer to the arm64 asid mechanism). In this way, the TLB broadcast
>> invalidation instruction can determine the address space processed on all
>> harts by asid/vmid.
>>
>> Different from the CPU SMP system, there is no context switch for the
>> DMA-IOMMU system, so the unification with the CPU asid/vmid cannot be
>> guaranteed. So we need a unique identifier for the address space to
>> establish a communication bridge between the TLBs of different systems.
>>
>> That is PGD.PPN (for virtualization scenarios: S1/S2.PGD.PPN)
>>
>> current:
>>  sfence.vma  rs1 = vaddr, rs2 = asid
>>  hfence.vvma rs1 = vaddr, rs2 = asid
>>  hfence.gvma rs1 = gaddr, rs2 = vmid
>>
>> proposed:
>>  sfence.vma  rs1 = vaddr, rs2 = mode:ppn:asid
>>  hfence.vvma rs1 = vaddr, rs2 = mode:ppn:asid
>>  hfence.gvma rs1 = gaddr, rs2 = mode:ppn:vmid
>>
>>  mode      - broadcast | local
>>  ppn       - the PPN of the address space of the root page table
>>  vmid/asid - the window identifier of the address space
>>
>> At the Linux Plumber Conference 2019 RISCV-MC, ref:[1], we've showed two
>> IOMMU examples to explain how it work with hardware.
>>
>> 1) In a lightweight IOMMU system (up to 64 address spaces), the hardware
>>    could directly convert PGD.PPN into DID (IOMMU ASID)
>>
>> 2) For the PCI ATS scenario, its IO ASID/VMID encoding space can support
>>    a very large number of address spaces. We use two reverse mapping
>>    tables to let the hardware translate S1/S2.PGD.PPN into IO ASID/VMID.
>>
>> ASYNC BROADCAST SFENCE.VMA
>> ===========================
>>
>> To support the high latency broadcast sfence.vma operation in the PCI ATS
>> usage scenario, we modify the sfence.vma from synchronous mode to
>> asynchronous mode. (For simpler implementation, if hardware only implement
>> synchronous mode and software still work in asynchronous mode)
>>
>> To implement the asynchronous mode, 3 features are added:
>>  1) sstatus:TLBI
>>     A "status bit - TLBI" is added to the sstatus register. The TLBI status
>>     bit indicates if there are still outstanding sfence.vma requests on the
>>     current hart.
>>     Value:
>>       1: sfence.vma requests are not completed.
>>       0: all sfece.vma requests completed, request queue is empty.
>>
>>  2) sstatus:TLBIC
>>     A "control bits - TLBIC" is added to sstatus register. The TLBIC control
>>     bits are controlled by software.
>>     "Write 1" will trigger the current hart check to see if there are still
>>     outstanding sfence.vma requests. If there are unfinished requests, an
>>     interrupt will be generated when the request is completed, notifying the
>>     software that all of the current sfence.vma requests have been completed.
>>     "Write 0" will cause nothing.
>>
>>  3) supervisor interrupt register (sip & sie):TLBI finish interrupt
>>     A per-hart interrupt is added to supervisor interrupt registers.
>>     When all sfence.vma requests are completed and sstatus:TLBIC has been
>>     triggered, hart will receive a TLBI finish interrupt. Just like timer,
>>     software and external interrupt's definition in sip & sie.
>>
>> Fake code:
>>
>> flush_tlb_page(vma, addr) {
>>     asid = cpu_asid(vma->vm_mm);
>>     ppn = PFN_DOWN(vma->vm_mm->pgd);
>>
>>     sfence.vma (addr, 1|PPN_OFFSET(ppn)|asid); //1. start request
>>
>>     while(sstatus:TLBI) if (time_out() > 1ms) break; //2. loop check
>>
>>     while (sstatus:TLBI) {
>>         ...
>>         set sstatus:TLBIC;
>>         wait_TLBI_finish_interrupt(); //3. wait irq, io_schedule
>>     }
>> }
>>
>> Here we give 2 level check:
>>  1) loop check sstatus:TLBI, CPU could response Interrupt.
>>  2) set sstatus:TLBIC and wait for irq, CPU schedule out for other task.
>>
>> ACE-DVM Example
>> ===============
>>
>> Honestly, "broadcasting addr, asid, vmid, S1/S2.PGD.PPN to interconnects"
>> and "ASYNC SFENCE.VMA" could be implemented by ACE-DVM protocol ref [2].
>>
>> There are 3 types of transactions in DVM:
>>
>>  - DVM operation
>>    Send all information to the interconnect, including addr, asid,
>>    S1.PGD.PPN, vmid, S2.PGD.PPN.
>>
>>  - DVM synchronization
>>    Check that all DVM operations have been completed. If not, it will use
>>    state machine to wait DVM complete requests.
>>
>>  - DVM complete
>>    Return transaction from components, eg: IOMMU. If hart has received all
>>    DVM completes which are triggered by sfence.vma instructions and
>>    "sstatus:TLBIC" has been set, a TLBI finish interrupt is triggered.
>>
>> (Actually, we do not need to implement the above functions strictly
>>  according to the ACE specification :P )
>>
>>  1: https://www.linuxplumbersconf.org/event/4/contributions/307/
>>  2: AMBA AXI and ACE Protocol Specification - Distributed Virtual Memory
>>     Transactions"
>>
>> Signed-off-by: Guo Ren <ren_guo@...ky.com>
>> Reviewed-by: Li Feiteng <feiteng_li@...ky.com>
>> ---
>>  src/hypervisor.tex |  43 ++++++++-------
>>  src/supervisor.tex | 155 +++++++++++++++++++++++++++++++++++++++++------------
>>  2 files changed, 143 insertions(+), 55 deletions(-)
>>
>> diff --git a/src/hypervisor.tex b/src/hypervisor.tex
>> index 47b90b2..3718819 100644
>> --- a/src/hypervisor.tex
>> +++ b/src/hypervisor.tex
>> @@ -1094,15 +1094,15 @@ The hypervisor extension adds two new privileged fence instructions.
>>  \multicolumn{1}{c|}{opcode} \\
>>  \hline
>>  7 & 5 & 5 & 3 & 5 & 7 \\
>> -HFENCE.GVMA & vmid & gaddr & PRIV & 0 & SYSTEM \\
>> -HFENCE.VVMA & asid & vaddr & PRIV & 0 & SYSTEM \\
>> +HFENCE.GVMA & mode:ppn:vmid & gaddr & PRIV & 0 & SYSTEM \\
>> +HFENCE.VVMA & mode:ppn:asid & vaddr & PRIV & 0 & SYSTEM \\
>>  \end{tabular}
>>  \end{center}
>>
>>  The hypervisor memory-management fence instructions, HFENCE.GVMA and
>>  HFENCE.VVMA, are valid only in HS-mode when {\tt mstatus}.TVM=0, or in M-mode
>>  (irrespective of {\tt mstatus}.TVM).
>> -These instructions perform a function similar to SFENCE.VMA
>> +These instructions perform a function similar to SFENCE.VMA (broadcast/local)
>>  (Section~\ref{sec:sfence.vma}), except applying to the guest-physical
>>  memory-management data structures controlled by CSR {\tt hgatp} (HFENCE.GVMA)
>>  or the VS-level memory-management data structures controlled by CSR {\tt vsatp}
>> @@ -1136,11 +1136,10 @@ An HFENCE.VVMA instruction applies only to a single virtual machine, identified
>>  by the setting of {\tt hgatp}.VMID when HFENCE.VVMA executes.
>>  \end{commentary}
>>
>> -When {\em rs2}$\neq${\tt x0}, bits XLEN-1:ASIDMAX of the value held in {\em
>> -rs2} are reserved for future use and should be zeroed by software and ignored
>> -by current implementations.
>> -Furthermore, if ASIDLEN~$<$~ASIDMAX, the implementation shall ignore bits
>> -ASIDMAX-1:ASIDLEN of the value held in {\em rs2}.
>> +When {\em rs2}$\neq${\tt x0}, bits contain 3 informations: mode, ppn, asid.
>> +1) mode control HFENCE.VVMA broadcast or not.
>> +2) ppn is the root page talbe's PPN of the asid address space.
>> +3) asid is the identifier of process in virtual machine.
>>
>>  \begin{commentary}
>>  Simpler implementations of HFENCE.VVMA can ignore the guest virtual address in
>> @@ -1168,11 +1167,10 @@ physical addresses in PMP address registers (Section~\ref{sec:pmp}) and in page
>>  table entries (Sections \ref{sec:sv32}, \ref{sec:sv39}, and~\ref{sec:sv48}).
>>  \end{commentary}
>>
>> -When {\em rs2}$\neq${\tt x0}, bits XLEN-1:VMIDMAX of the value held in {\em
>> -rs2} are reserved for future use and should be zeroed by software and ignored
>> -by current implementations.
>> -Furthermore, if VMIDLEN~$<$~VMIDMAX, the implementation shall ignore bits
>> -VMIDMAX-1:VMIDLEN of the value held in {\em rs2}.
>> +When {\em rs2}$\neq${\tt x0}, bits contain 3 informations: mode, vmid, ppn.
>> +1) mode control HFENCE.GVMA broadcast or not.
>> +2) ppn is the root page talbe's PPN of the vmid address space.
>> +3) vmid is the identifier of virtual machine.
>>
>>  \begin{commentary}
>>  Simpler implementations of HFENCE.GVMA can ignore the guest physical address in
>> @@ -1567,21 +1565,22 @@ register.
>>  \subsection{Memory-Management Fences}
>>
>>  The behavior of the SFENCE.VMA instruction is affected by the current
>> -virtualization mode V.  When V=0, the virtual-address argument is an HS-level
>> -virtual address, and the ASID argument is an HS-level ASID.
>> +virtualization mode V.  When V=0, the rs1 argument is an HS-level
>> +virtual address, and the rs2 argument is an HS-level ASID and root page table's PPN.
>>  The instruction orders stores only to HS-level address-translation structures
>>  with subsequent HS-level address translations.
>>
>> -When V=1, the virtual-address argument to SFENCE.VMA is a guest virtual
>> -address within the current virtual machine, and the ASID argument is a VS-level
>> -ASID within the current virtual machine.
>> +When V=1, the rs1 argument to SFENCE.VMA is a guest virtual
>> +address within the current virtual machine, and the rs2 argument is a VS-level
>> +ASID and root page table's PPN within the current virtual machine.
>>  The current virtual machine is identified by the VMID field of CSR {\tt hgatp},
>> -and the effective ASID can be considered to be the combination of this VMID
>> -with the VS-level ASID.
>> +and the effective ASID and root page table's PPN can be considered to be the
>> +combination of this VMID and root page table's PPN with the VS-level ASID and
>> +root page table's PPN.
>>  The SFENCE.VMA instruction orders stores only to the VS-level
>>  address-translation structures with subsequent VS-level address translations
>> -for the same virtual machine, i.e., only when {\tt hgatp}.VMID is the same as
>> -when the SFENCE.VMA executed.
>> +for the same virtual machine, i.e., only when {\tt hgatp}.VMID and {\\tt hgatp}.PPN is
>> +the same as when the SFENCE.VMA executed.
>>
>>  Hypervisor instructions HFENCE.GVMA and HFENCE.VVMA provide additional
>>  memory-management fences to complement SFENCE.VMA.
>> diff --git a/src/supervisor.tex b/src/supervisor.tex
>> index ba3ced5..2877b7a 100644
>> --- a/src/supervisor.tex
>> +++ b/src/supervisor.tex
>> @@ -47,10 +47,12 @@ register keeps track of the processor's current operating state.
>>  \begin{center}
>>  \setlength{\tabcolsep}{4pt}
>>  \scalebox{0.95}{
>> -\begin{tabular}{cWcccccWccccWcc}
>> +\begin{tabular}{cccWcccccWccccWcc}
>>  \\
>>  \instbit{31} &
>> -\instbitrange{30}{20} &
>> +\instbit{30} &
>> +\instbit{29} &
>> +\instbitrange{28}{20} &
>>  \instbit{19} &
>>  \instbit{18} &
>>  \instbit{17} &
>> @@ -66,6 +68,8 @@ register keeps track of the processor's current operating state.
>>  \instbit{0} \\
>>  \hline
>>  \multicolumn{1}{|c|}{SD} &
>> +\multicolumn{1}{|c|}{TLBI} &
>> +\multicolumn{1}{|c|}{TLBIC} &
>>  \multicolumn{1}{c|}{\wpri} &
>>  \multicolumn{1}{c|}{MXR} &
>>  \multicolumn{1}{c|}{SUM} &
>> @@ -82,7 +86,7 @@ register keeps track of the processor's current operating state.
>>  \multicolumn{1}{c|}{\wpri}
>>  \\
>>  \hline
>> -1 & 11 & 1 & 1 & 1 & 2 & 2 & 4 & 1 & 1 & 1 & 1 & 3 & 1 & 1 \\
>> +1 & 1 & 1 & 10 & 1 & 1 & 1 & 2 & 2 & 4 & 1 & 1 & 1 & 1 & 3 & 1 & 1 \\
>>  \end{tabular}}
>>  \end{center}
>>  }
>> @@ -95,10 +99,12 @@ register keeps track of the processor's current operating state.
>>  {\footnotesize
>>  \begin{center}
>>  \setlength{\tabcolsep}{4pt}
>> -\begin{tabular}{cMFScccc}
>> +\begin{tabular}{cccMFScccc}
>>  \\
>>  \instbit{SXLEN-1} &
>> -\instbitrange{SXLEN-2}{34} &
>> +\instbit{SXLEN-2} &
>> +\instbit{SXLEN-3} &
>> +\instbitrange{SXLEN-4}{34} &
>>  \instbitrange{33}{32} &
>>  \instbitrange{31}{20} &
>>  \instbit{19} &
>> @@ -107,6 +113,8 @@ register keeps track of the processor's current operating state.
>>   \\
>>  \hline
>>  \multicolumn{1}{|c|}{SD} &
>> +\multicolumn{1}{|c|}{TLBI} &
>> +\multicolumn{1}{|c|}{TLBIC} &
>>  \multicolumn{1}{c|}{\wpri} &
>>  \multicolumn{1}{c|}{UXL[1:0]} &
>>  \multicolumn{1}{c|}{\wpri} &
>> @@ -115,7 +123,7 @@ register keeps track of the processor's current operating state.
>>  \multicolumn{1}{c|}{\wpri} &
>>   \\
>>  \hline
>> -1 & SXLEN-35 & 2 & 12 & 1 & 1 & 1 & \\
>> +1 & 1 & 1 & SXLEN-37 & 2 & 12 & 1 & 1 & 1 & \\
>>  \end{tabular}
>>  \begin{tabular}{cWWFccccWcc}
>>  \\
>> @@ -152,6 +160,17 @@ register keeps track of the processor's current operating state.
>>  \label{sstatusreg}
>>  \end{figure*}
>>
>> +The TLBI (read-only) bit indicates that any async sfence.vma operations are
>> +still pended on the hart. The value:0 means that there is no sfence.vma
>> +operations pending and value:1 means that there are still sfence.vma operations
>> +pending on the hart.
>> +
>> +When the sstatus:TLBIC bit is written 1, it triggers the hardware to check if
>> +there are any TLB invalidate operations being pended. When all operations are
>> +finished, a TLB Invalidate finish interrupt will be triggered
>> +(see Section~\ref{sipreg}). When the sstatus:TLBIC bit is written 0, it will
>> +cause nothing. Reading sstatus:TLBIC bit will alaways return 0.
>> +
>>  The SPP bit indicates the privilege level at which a hart was executing before
>>  entering supervisor mode.  When a trap is taken, SPP is set to 0 if the trap
>>  originated from user mode, or 1 otherwise.  When an SRET instruction
>> @@ -329,8 +348,10 @@ SXLEN-bit read/write register containing interrupt enable bits.
>>  {\footnotesize
>>  \begin{center}
>>  \setlength{\tabcolsep}{4pt}
>> -\begin{tabular}{KcFcFcc}
>> -\instbitrange{SXLEN-1}{10} &
>> +\begin{tabular}{KcFcFcFcc}
>> +\instbitrange{SXLEN-1}{14} &
>> +\instbit{13} &
>> +\instbitrange{12}{10} &
>>  \instbit{9} &
>>  \instbitrange{8}{6} &
>>  \instbit{5} &
>> @@ -339,6 +360,8 @@ SXLEN-bit read/write register containing interrupt enable bits.
>>  \instbit{0} \\
>>  \hline
>>  \multicolumn{1}{|c|}{\wpri} &
>> +\multicolumn{1}{c|}{STLBIP} &
>> +\multicolumn{1}{|c|}{\wpri} &
>>  \multicolumn{1}{c|}{SEIP} &
>>  \multicolumn{1}{c|}{\wpri} &
>>  \multicolumn{1}{c|}{STIP} &
>> @@ -346,7 +369,7 @@ SXLEN-bit read/write register containing interrupt enable bits.
>>  \multicolumn{1}{c|}{SSIP} &
>>  \multicolumn{1}{c|}{\wpri} \\
>>  \hline
>> -SXLEN-10 & 1 & 3 & 1 & 3 & 1 & 1 \\
>> +SXLEN-14 & 1 & 3 & 1 & 3 & 1 & 3 & 1 & 1 \\
>>  \end{tabular}
>>  \end{center}
>>  }
>> @@ -359,8 +382,10 @@ SXLEN-10 & 1 & 3 & 1 & 3 & 1 & 1 \\
>>  {\footnotesize
>>  \begin{center}
>>  \setlength{\tabcolsep}{4pt}
>> -\begin{tabular}{KcFcFcc}
>> -\instbitrange{SXLEN-1}{10} &
>> +\begin{tabular}{KcFcFcFcc}
>> +\instbitrange{SXLEN-1}{14} &
>> +\instbit{13} &
>> +\instbitrange{12}{10} &
>>  \instbit{9} &
>>  \instbitrange{8}{6} &
>>  \instbit{5} &
>> @@ -369,6 +394,8 @@ SXLEN-10 & 1 & 3 & 1 & 3 & 1 & 1 \\
>>  \instbit{0} \\
>>  \hline
>>  \multicolumn{1}{|c|}{\wpri} &
>> +\multicolumn{1}{c|}{STLBIE} &
>> +\multicolumn{1}{|c|}{\wpri} &
>>  \multicolumn{1}{c|}{SEIE} &
>>  \multicolumn{1}{c|}{\wpri} &
>>  \multicolumn{1}{c|}{STIE} &
>> @@ -376,7 +403,7 @@ SXLEN-10 & 1 & 3 & 1 & 3 & 1 & 1 \\
>>  \multicolumn{1}{c|}{SSIE} &
>>  \multicolumn{1}{c|}{\wpri} \\
>>  \hline
>> -SXLEN-10 & 1 & 3 & 1 & 3 & 1 & 1 \\
>> +SXLEN-14 & 1 & 3 & 1 & 3 & 1 & 3 & 1 & 1 \\
>>  \end{tabular}
>>  \end{center}
>>  }
>> @@ -410,6 +437,12 @@ when the SEIE bit in the {\tt sie} register is clear.  The implementation
>>  should provide facilities to mask, unmask, and query the cause of external
>>  interrupts.
>>
>> +A supervisor-level TLB Invalidate finish interrupt is pending if the STLBIP bit
>> +in the {\tt sip} register is set.  Supervisor-level TLB Invalidate finish
>> +interrupts are disabled when the STLBIE bit in the {\tt sie} register is clear.
>> +When hart tlb invalidate operations are finished, hardware will change sstatus:TLBI
>> +bit from 1 to 0 and trigger TLB Invalidate finish interrupt.
>> +
>>  \begin{commentary}
>>  The {\tt sip} and {\tt sie} registers are subsets of the {\tt mip} and {\tt
>>  mie} registers.  Reading any field, or writing any writable field, of {\tt
>> @@ -598,7 +631,9 @@ so is only guaranteed to hold supported exception codes.
>>    1         & 5               & Supervisor timer interrupt \\
>>    1         & 6--8            & {\em Reserved} \\
>>    1         & 9               & Supervisor external interrupt \\
>> -  1         & 10--15          & {\em Reserved} \\
>> +  1         & 10--11          & {\em Reserved} \\
>> +  1         & 12              & Supervisor TLBI finish interrupt \\
>> +  1         & 13--15          & {\em Reserved} \\
>>    1         & $\ge$16         & {\em Available for platform use} \\ \hline
>>    0         & 0               & Instruction address misaligned \\
>>    0         & 1               & Instruction access fault \\
>> @@ -884,7 +919,7 @@ provided.
>>  \multicolumn{1}{c|}{opcode} \\
>>  \hline
>>  7 & 5 & 5 & 3 & 5 & 7 \\
>> -SFENCE.VMA & asid & vaddr & PRIV & 0 & SYSTEM \\
>> +SFENCE.VMA & mode:ppn:asid & vaddr & LOCAL & 0 & SYSTEM \\
>>  \end{tabular}
>>  \end{center}
>>
>> @@ -899,21 +934,70 @@ from that hart to the memory-management data structures.
>>  Further details on the behavior of this instruction are
>>  described in Section~\ref{virt-control} and Section~\ref{pmp-vmem}.
>>
>> +SFENCE.VMA is defined as an asynchronous completion instruction, which means
>> +that the TLB operation is not guaranteed to complete when the instruction retires.
>> +Software need check sstatus:TLBI to determine all TLB operations complete.
>> +The sstatus:TLBI described in Section~\ref{sstatus}. When hardware change
>> +sstatus:TLBI bit from 1 to 0, the TLB Invalidate finish interrupt will be
>> +triggered.
>> +
>>  \begin{commentary}
>> -The SFENCE.VMA is used to flush any local hardware caches related to
>> +The SFENCE.VMA is used to flush any local/remote hardware caches related to
>>  address translation.  It is specified as a fence rather than a TLB
>>  flush to provide cleaner semantics with respect to which instructions
>>  are affected by the flush operation and to support a wider variety of
>>  dynamic caching structures and memory-management schemes.  SFENCE.VMA
>>  is also used by higher privilege levels to synchronize page table
>> -writes and the address translation hardware.
>> +writes and the address translation hardware. There is a mode bit to determine
>> +sfence.vma would broadcast on interconnect or not.
>>  \end{commentary}
>>
>> -SFENCE.VMA orders only the local hart's implicit references to the
>> -memory-management data structures.
>> +\begin{figure}[h!]
>> +{\footnotesize
>> +\begin{center}
>> +\begin{tabular}{c@{}E@...}
>> +\instbit{31} &
>> +\instbitrange{30}{9} &
>> +\instbitrange{8}{0} \\
>> +\hline
>> +\multicolumn{1}{|c|}{{\tt MODE}} &
>> +\multicolumn{1}{|c|}{{\tt PPN (root page table)}} &
>> +\multicolumn{1}{|c|}{{\tt ASID}} \\
>> +\hline
>> +1 & 22 & 9 \\
>> +\end{tabular}
>> +\end{center}
>> +}
>> +\vspace{-0.1in}
>> +\caption{RV32 sfence.vma rs2 format.}
>> +\label{rv32satp}
>> +\end{figure}
>> +
>> +\begin{figure}[h!]
>> +{\footnotesize
>> +\begin{center}
>> +\begin{tabular}{@{}S@{}T@...}
>> +\instbitrange{63}{60} &
>> +\instbitrange{59}{16} &
>> +\instbitrange{15}{0} \\
>> +\hline
>> +\multicolumn{1}{|c|}{{\tt MODE}} &
>> +\multicolumn{1}{|c|}{{\tt PPN (root page table)}} &
>> +\multicolumn{1}{|c|}{{\tt ASID}} \\
>> +\hline
>> +4 & 44 & 16 \\
>> +\end{tabular}
>> +\end{center}
>> +}
>> +\vspace{-0.1in}
>> +\caption{RV64 sfence.vma rs2 format, for MODE values, only highest bit:63 is
>> +valid and others are reserved.}
>> +\label{rv64satp}
>> +\end{figure}
>>
>>  \begin{commentary}
>> -Consequently, other harts must be notified separately when the
>> +The mode's highest bit could control sfence.vma behavior with 1:broadcast or 0:local.
>> +If only have mode:local, other harts must be notified separately when the
>>  memory-management data structures have been modified.
>>  One approach is to use 1)
>>  a local data fence to ensure local writes are visible globally, then
>> @@ -928,8 +1012,17 @@ modified for a single address mapping (i.e., one page or superpage), {\em rs1}
>>  can specify a virtual address within that mapping to effect a translation
>>  fence for that mapping only.  Furthermore, for the common case that the
>>  translation data structures have only been modified for a single address-space
>> -identifier, {\em rs2} can specify the address space.  The behavior of
>> -SFENCE.VMA depends on {\em rs1} and {\em rs2} as follows:
>> +identifier, {\em rs2} can specify the address space with {\tt satp} format
>> +which include asid and root page table's PPN information.
>> +
>> +\begin{commentary}
>> +We use ASID and root page table's PPN to determine address space and the format
>> +stored in rs2 is similar with {\tt satp} described in Section~\ref{sec:satp}.
>> +ASID are used by local harts and root page table's PPN of the asid are used by
>> +other different TLB systems, eg: IOMMU.
>> +\end{commentary}
>> +
>> +The behavior of SFENCE.VMA depends on {\em rs1} and {\em rs2} as follows:
>>
>>  \begin{itemize}
>>  \item If {\em rs1}={\tt x0} and {\em rs2}={\tt x0}, the fence orders all
>> @@ -939,23 +1032,18 @@ SFENCE.VMA depends on {\em rs1} and {\em rs2} as follows:
>>        all reads and writes made to any level of the page tables, but only
>>        for the address space identified by integer register {\em rs2}.
>>        Accesses to {\em global} mappings (see Section~\ref{sec:translation})
>> -      are not ordered.
>> +      are not ordered. The mode field in rs2 is determine broadcast or local.
>>  \item If {\em rs1}$\neq${\tt x0} and {\em rs2}={\tt x0}, the fence orders
>>        only reads and writes made to the leaf page table entry corresponding
>>        to the virtual address in {\em rs1}, for all address spaces.
>>  \item If {\em rs1}$\neq${\tt x0} and {\em rs2}$\neq${\tt x0}, the fence
>>        orders only reads and writes made to the leaf page table entry
>>        corresponding to the virtual address in {\em rs1}, for the address
>> -      space identified by integer register {\em rs2}.
>> +      space identified by integer register {\em rs2}. The mode field in rs2
>> +      is determine broadcast or local.
>>        Accesses to global mappings are not ordered.
>>  \end{itemize}
>>
>> -When {\em rs2}$\neq${\tt x0}, bits SXLEN-1:ASIDMAX of the value held in {\em
>> -rs2} are reserved for future use and should be zeroed by software and ignored
>> -by current implementations.  Furthermore, if ASIDLEN~$<$~ASIDMAX, the
>> -implementation shall ignore bits ASIDMAX-1:ASIDLEN of the value held in {\em
>> -rs2}.
>> -
>>  \begin{commentary}
>>  Simpler implementations can ignore the virtual address in {\em rs1} and
>>  the ASID value in {\em rs2} and always perform a global fence.
>> @@ -994,7 +1082,7 @@ can execute the same SFENCE.VMA instruction while a different ASID is loaded
>>  into {\tt satp}, provided the next time {\tt satp} is loaded with the recycled
>>  ASID, it is simultaneously loaded with the new page table.
>>
>> -\item If the implementation does not provide ASIDs, or software chooses to
>> +\item If the implementation does not provide ASIDs and PPNs, or software chooses to
>>  always use ASID 0, then after every {\tt satp} write, software should execute
>>  SFENCE.VMA with {\em rs1}={\tt x0}.  In the common case that no global
>>  translations have been modified, {\em rs2} should be set to a register other than
>> @@ -1003,13 +1091,14 @@ not flushed.
>>
>>  \item If software modifies a non-leaf PTE, it should execute SFENCE.VMA with
>>  {\em rs1}={\tt x0}.  If any PTE along the traversal path had its G bit set,
>> -{\em rs2} must be {\tt x0}; otherwise, {\em rs2} should be set to the ASID for
>> -which the translation is being modified.
>> +{\em rs2} must be {\tt x0}; otherwise, {\em rs2} should be set to the ASID and
>> +root page table's PPN for which the translation is being modified.
>>
>>  \item If software modifies a leaf PTE, it should execute SFENCE.VMA with {\em
>>  rs1} set to a virtual address within the page.  If any PTE along the traversal
>>  path had its G bit set, {\em rs2} must be {\tt x0}; otherwise, {\em rs2}
>> -should be set to the ASID for which the translation is being modified.
>> +should be set to the ASID and root page table's PPN for which the translation
>> +is being modified.
>>
>>  \item For the special cases of increasing the permissions on a leaf PTE and
>>  changing an invalid PTE to a valid leaf, software may choose to execute
>> --
>> 2.7.4
>>
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>>
>> View/Reply Online (#810): https://lists.riscv.org/g/tech-privileged/message/810
>> Mute This Topic: https://lists.riscv.org/mt/34198986/1677273
>> Group Owner: tech-privileged+owner@...ts.riscv.org
>> Unsubscribe: https://lists.riscv.org/g/tech-privileged/unsub  [andrew@...ive.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ