lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 4 Nov 2022 15:53:43 -0400
From:   Brian Masney <bmasney@...hat.com>
To:     Eric Chanudet <echanude@...hat.com>,
        Parikshit Pareek <quic_ppareek@...cinc.com>
Cc:     Andy Gross <agross@...nel.org>,
        Bjorn Andersson <andersson@...nel.org>,
        Konrad Dybcio <konrad.dybcio@...ainline.org>,
        Rob Herring <robh+dt@...nel.org>,
        Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
        linux-arm-msm@...r.kernel.org, devicetree@...r.kernel.org,
        linux-kernel@...r.kernel.org, Andrew Halaney <ahalaney@...hat.com>,
        Shazad Hussain <quic_shazhuss@...cinc.com>,
        Johan Hovold <johan@...nel.org>
Subject: Re: [PATCH v5 0/3] arm64: dts: qcom: add dts for sa8540p-ride board

On Mon, Oct 17, 2022 at 03:23:25PM -0400, Brian Masney wrote:
> Parikshit: I found a way to reproduce the crash and isolated the issue
> to the qcom_q6v5_pas driver. Here's how you can reproduce the crash
> that we're seeing:
> 
> 1) Use my instructions at [1] to build an upstream kernel with the arm64
>    defconfg. Today I used linux-next-20221017.
> 
> 2) Copy the modules to the root filesystem. Before you reboot, mv
>    /lib/modules/6.0.0-next-20221017-xxx to
>    /lib/modules/6.0.0-next-20221017-xxx-old so that the modules are not
>    automatically loaded on startup.
> 
> 3) Reboot, and run lsmod and verify that no modules are loaded.
> 
> 4) cd /lib/modules/6.0.0-next-20221017-xxx-old
> 
> 5) Now load the modules that work as expected that are loaded with the
>    upstream arm64 defconfig:
> 
>         insmod ./kernel/net/rfkill/rfkill.ko
>         insmod ./kernel/arch/arm64/crypto/crct10dif-ce.ko
>         insmod ./kernel/net/qrtr/qrtr.ko
>         insmod ./kernel/drivers/phy/qualcomm/phy-qcom-snps-femto-v2.ko
>         insmod ./kernel/drivers/soc/qcom/llcc-qcom.ko
>         insmod ./kernel/drivers/soc/qcom/qmi_helpers.ko
>         insmod ./kernel/drivers/remoteproc/qcom_sysmon.ko
>         insmod ./kernel/drivers/remoteproc/qcom_q6v5.ko
>         insmod ./kernel/drivers/rpmsg/qcom_glink_smem.ko
>         insmod ./kernel/drivers/soc/qcom/socinfo.ko
>         insmod ./kernel/drivers/remoteproc/qcom_pil_info.ko
>         insmod ./kernel/drivers/remoteproc/qcom_common.ko
>         insmod ./kernel/drivers/watchdog/qcom-wdt.ko
>         insmod ./kernel/fs/fuse/fuse.ko
>         insmod ./kernel/drivers/soc/qcom/mdt_loader.ko
> 
> 6) Wait a few minutes to be sure that everything is working as expected
>    on the board.
> 
> 7) Make the board go BOOM:
> 
>         insmod ./kernel/drivers/remoteproc/qcom_q6v5_pas.ko
> 
> We don't know how or have the tools to analyze the ramdumps from the
> Qualcomm firmware at Red Hat, so we're flying blind right now.
> 
> [1] https://lore.kernel.org/lkml/YzsciFeYpvv%2F92CG@x1/

I isolated the hang issue above to a single Kconfig symbol. First, a
quick background. We're not seeing the hang issue using the upstream
kernel with Red Hat's automotive kernel config. We see the hang though
with the upstream arm64 defconfig. There's thousands of symbol
differences between the two defconfigs and none of the changes stuck out
to me. I wrote some code that slowly morphed the Red Hat defconfig into
the upstream arm64 defconfig and committed the symbol changes in stages
along the way. This allowed me to do an automated 'git bisect'.

The symbol CONFIG_NO_HZ_IDLE=y is what triggers the hang. When I remove
that line from arch/arm64/configs/defconfig, then the board continues to
function normally after the qcom_q6v5_pas.ko module is loaded.

Any ideas what could be causing this? Could it be the safety island is
monitoring for a kernel tick and if it doesn't sense one then it kills
the kernel and goes into ramdump mode?

Brian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ