linux-kernel - Re: [PATCH V2] ACPI / APEI: restore interrupt before panic in sdei flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5951ad5b-d755-0150-0f2a-c567eb454dac@arm.com>
Date:   Wed, 13 Oct 2021 18:44:49 +0100
From:   James Morse <james.morse@....com>
To:     Liguang Zhang <zhangliguang@...ux.alibaba.com>
Cc:     linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
        Tony Luck <tony.luck@...el.com>,
        linux-arm-kernel@...ts.infradead.org,
        Borislav Petkov <bp@...en8.de>, Len Brown <lenb@...nel.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: [PATCH V2] ACPI / APEI: restore interrupt before panic in sdei
 flow

Hello!

On 12/10/2021 15:29, Liguang Zhang wrote:
> When hest acpi table configure Hardware Error Notification type as
> Software Delegated Exception(0x0B) for RAS event, OS RAS interacts with
> ATF by SDEI mechanism. On the firmware first system, OS was notified by
> ATF sdei call.
> 
> The calling flow like as below when fatal RAS error happens:
> 
> ATF notify OS flow:
>   sdei_dispatch_event()
>     ehf_activate_priority()
>       call sdei callback  // callback registered by OS
>     ehf_deactivate_priority()
> 
> OS sdei callback:
>   sdei_asm_handler()
>     __sdei_handler()
>       _sdei_handler()
>         sdei_event_handler()
>           ghes_sdei_critical_callback()
>             ghes_in_nmi_queue_one_entry()
>               /* if RAS error is fatal */
>               __ghes_panic()
>                 panic()
> 
> If fatal RAS error occured, panic was called in sdei_asm_handle()
> without ehf_deactivate_priority executed, which lead interrupt masked.

So far the story is:
Firmware generated and SDEI event (a kind of software NMI) because of a firmware
interrupt, but it hasn't completely handled the interrupt.

> If interrupt masked, system would be halted in kdump flow like this:
> 
> arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.3.auto: allocated 32768 entries for evtq
> arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for priq
> arm-smmu-v3 arm-smmu-v3.3.auto: SMMU currently enabled! Resetting...

How and why do firmware interrupts affect the IOMMU?

It sounds like you are sharing something with firmware that you shouldn't.

> After debug, we found accurate halted position is:
> arm_smmu_device_probe()
>   arm_smmu_device_reset()
>     arm_smmu_device_disable()
>       arm_smmu_write_reg_sync()
>         readl_relaxed_poll_timeout()
>           readx_poll_timeout()
>             read_poll_timeout()
>               usleep_range() // hrtimer is never waked.
> 
> So interrupt should be restored before panic otherwise kdump will trigger
> error.

Why can't firmware finish with the interrupt before injecting the SDEI event?
If you need it to not happen a second time while the handler runs, you can always disable it.

The text in the spec about the interaction of complete and physical interrupts is for
bound interrupts. Linux doesn't support these. It isn't possible for linux to know whether
firmware tied any other kind of event to a physical interrupt or not.

> In the process of sdei, a SDEI_EVENT_COMPLETE_AND_RESUME call
> should be called before panic for a completed run of ehf_deactivate_priority().

SDEI_EVENT_COMPLETE_AND_RESUME is a complete, it tells firmware to restore the execution
state from before the event. You get almost get away with x17-x30 being corrupted as
panic() won't return - but the stack trace produced will be corrupt. If the original
exception was from user-space, SP_EL0 will have been restored to be the user value. The
kernel uses this for 'current'.

The way this is supposed to work is the die-ing kernel calls SDEI_PE_MASK while it does
the kdump reboot. Once the kdump kernel has started, the SDEI_PRIVATE_RESET and
SDEI_SHARED_RESET calls should fix anything left over in firmware.

Could you debug why firmware interrupts being active prevent the SMMU from being reset. As
far as I can tell, those should be totally independent.

Thanks,

James