lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <c61f63b6-662e-9c85-9135-50710fec79fc@nvidia.com>
Date:   Sat, 18 Mar 2023 10:29:01 -0500
From:   Shanker Donthineni <sdonthineni@...dia.com>
To:     Marc Zyngier <maz@...nel.org>
Cc:     Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Mark Rutland <mark.rutland@....com>,
        Lorenzo Pieralisi <lpieralisi@...nel.org>,
        Sudeep Holla <sudeep.holla@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, Vikram Sethi <vsethi@...dia.com>,
        Thierry Reding <treding@...dia.com>
Subject: Re: [PATCH v4] irqchip/gicv3: Workaround for NVIDIA erratum
 T241-FABRIC-4

Hi Marc,

On 3/18/23 04:44, Marc Zyngier wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Sat, 18 Mar 2023 04:58:12 +0000,
> Shanker Donthineni <sdonthineni@...dia.com> wrote:
>>
>> The T241 platform suffers from the T241-FABRIC-4 erratum which causes
>> unexpected behavior in the GIC when multiple transactions are received
>> simultaneously from different sources. This hardware issue impacts
>> NVIDIA server platforms that use more than two T241 chips
>> interconnected. Each chip has support for 320 {E}SPIs.
>>
>> This issue occurs when multiple packets from different GICs are
>> incorrectly interleaved at the target chip. The erratum text below
>> specifies exactly what can cause multiple transfer packets susceptible
>> to interleaving and GIC state corruption. GIC state corruption can
>> lead to a range of problems, including kernel panics, and unexpected
>> behavior.
>>
>>  From the erratum text:
>>    "In some cases, inter-socket AXI4 Stream packets with multiple
>>    transfers, may be interleaved by the fabric when presented to ARM
>>    Generic Interrupt Controller. GIC expects all transfers of a packet
>>    to be delivered without any interleaving.
>>
>>    The following GICv3 commands may result in multiple transfer packets
>>    over inter-socket AXI4 Stream interface:
>>     - Register reads from GICD_I* and GICD_N*
>>     - Register writes to 64-bit GICD registers other than GICD_IROUTERn*
>>     - ITS command MOVALL
>>
>>    Multiple commands in GICv4+ utilize multiple transfer packets,
>>    including VMOVP, VMOVI, VMAPP, and 64-bit register accesses."
>>
>>    This issue impacts system configurations with more than 2 sockets,
>>    that require multi-transfer packets to be sent over inter-socket
>>    AXI4 Stream interface between GIC instances on different sockets.
>>    GICv4 cannot be supported. GICv3 SW model can only be supported
>>    with the workaround. Single and Dual socket configurations are not
>>    impacted by this issue and support GICv3 and GICv4."
>>
>> Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf
>>
>> Writing to the chip alias region of the GICD_In{E} registers except
>> GICD_ICENABLERn has an equivalent effect as writing to the global
>> distributor. The SPI interrupt deactivate path is not impacted by
>> the erratum.
>>
>> To fix this problem, implement a workaround that ensures read accesses
>> to the GICD_In{E} registers are directed to the chip that owns the
>> SPI, and disables GICv4.x features for KVM. To simplify code changes,
>> the gic_configure_irq() function uses the same alias region for both
>> read and write operations to GICD_ICFGR.
>>
>> Co-developed-by: Vikram Sethi <vsethi@...dia.com>
>> Signed-off-by: Vikram Sethi <vsethi@...dia.com>
>> Signed-off-by: Shanker Donthineni <sdonthineni@...dia.com>
>> ---
>> Changes since v2:
>>   - Fix the build issue for the 32bit arch
>> Changes since v2:
>>   - Add accessors for the SOC-ID version & revision
>>   - Include "linux/bitfield.h" and "linux/bits.h" in irq-gic-v3.c
>> Changes since v1:
>>   - Use SMCCC SOC-ID API for detecting the T241 chip
>>   - Implement Marc's suggestions
>>   - Edit commit text
> 
> You seem to have ignored most of my comments on v2[1] apart from the
> SOC_ID stuff. I guess I'll wait for v5...
> 
>          M.
> 
> [1] https://lore.kernel.org/all/871qlqif9v.wl-maz@kernel.org/
> 

Sorry, I did not intentionally ignore your input, but unfortunately, lost
this specific email in my outlook. Your feedback is valuable, and we will
ensure that all of your review comments are addressed in the v5.

-Shanker

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ