linux-kernel - Re: Unable to handle kernel paging request at virtual address dead

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFrxrKRuJec0pDLooovV3BJBVvmDizoL6N4eb+hv1D0NgA@mail.gmail.com>
Date:   Wed, 16 Sep 2020 09:37:26 +0200
From:   Ulf Hansson <ulf.hansson@...aro.org>
To:     Viresh Kumar <viresh.kumar@...aro.org>
Cc:     Naresh Kamboju <naresh.kamboju@...aro.org>,
        open list <linux-kernel@...r.kernel.org>,
        linux-mmc <linux-mmc@...r.kernel.org>,
        lkft-triage@...ts.linaro.org,
        linux-clk <linux-clk@...r.kernel.org>,
        Anders Roxell <anders.roxell@...aro.org>,
        Arnd Bergmann <arnd@...db.de>,
        Rajendra Nayak <rnayak@...eaurora.org>,
        John Stultz <john.stultz@...aro.org>
Subject: Re: Unable to handle kernel paging request at virtual address dead - __clk_put

On Wed, 16 Sep 2020 at 07:22, Viresh Kumar <viresh.kumar@...aro.org> wrote:
>
> On 15-09-20, 23:39, Ulf Hansson wrote:
> > On Tue, 15 Sep 2020 at 16:33, Naresh Kamboju <naresh.kamboju@...aro.org> wrote:
> > >
> > > arm64 dragonboard-410c boot failed while running linux next 2020915 due to
> > > the kernel crash.
> > >
> > > metadata:
> > >   git branch: master
> > >   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > >   git describe: next-20200915
> > >   make_kernelversion: 5.9.0-rc5
> > >   kernel-config:
> > > https://builds.tuxbuild.com/J5oDTkph2aj855oeGOd45Q/kernel.config
> > >
> > >
> > > crash log:
> > > -------------
> > > [    3.517615] Synopsys Designware Multimedia Card Interface Driver
> > > [    3.524258] sdhci-pltfm: SDHCI platform and OF driver helper
> > > [    3.531302] Unable to handle kernel paging request at virtual
> > > address dead000000000108
> > > [    3.531396] Mem abort info:
> > > [    3.531460] sdhci_msm 7864900.sdhci: Got CD GPIO
> > > [    3.539182]   ESR = 0x96000044
> > > [    3.541953] ledtrig-cpu: registered to indicate activity on CPUs
> > > [    3.546692]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > [    3.546701]   SET = 0, FnV = 0
> > > [    3.555694] usbcore: registered new interface driver usbhid
> > > [    3.555703] usbhid: USB HID core driver
> > > [    3.561638] genirq: irq_chip msmgpio did not update eff. affinity
> > > mask of irq 75
> > > [    3.563908]   EA = 0, S1PTW = 0
> > > [    3.580792] Data abort info:
> > > [    3.583673]   ISV = 0, ISS = 0x00000044
> > > [    3.583900] NET: Registered protocol family 10
> > > [    3.586785]   CM = 0, WnR = 1
> > > [    3.586794] [dead000000000108] address between user and kernel address ranges
> > > [    3.586806] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> > > [    3.591869] Segment Routing with IPv6
> > > [    3.594829] Modules linked in:
> > > [    3.594841] CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted
> > > 5.9.0-rc5-next-20200915 #1
> > > [    3.594844] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > > [    3.594862] Workqueue: events_unbound async_run_entry_fn
> > > [    3.597959] NET: Registered protocol family 17
> > > [    3.604991] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> > > [    3.605000] pc : __clk_put+0x40/0x140
> > > [    3.605009] lr : __clk_put+0x2c/0x140
> > > [    3.610613] 9pnet: Installing 9P2000 support
> > > [    3.614183] sp : ffff80001005bbe0
> > > [    3.614189] x29: ffff80001005bbe0
> > > [    3.617233] Key type dns_resolver registered
> > > [    3.624696] x28: 000000000000002e
> > > [    3.624701] x27: ffff00003b620a68 x26: ffff800011496568
> > > [    3.624708] x25: ffff00003fcfe8f8 x24: ffff00003d30c410
> > > [    3.632518] registered taskstats version 1
> > > [    3.636931] x23: ffff800011495cf8 x22: ffff00003b620a40
> > > [    3.636938] x21: ffff00003d30c400 x20: ffff00003b620580
> > > [    3.636945] x19: ffff00003b64f380 x18: 0000000007824000
> > > [    3.636951] x17: ffff00003b620a00 x16: ffff00003b6205d0
> > > [    3.636958] x15: ffff8000119929f8 x14: ffffffffffffffff
> > > [    3.636965] x13: ffff800012947000 x12: ffff800012947000
> > > [    3.636975] x11: 0000000000000020
> > > [    3.641233] Loading compiled-in X.509 certificates
> > > [    3.646650] x10: 0101010101010101
> > > [    3.646654] x9 : ffff8000107b4c84 x8 : 7f7f7f7f7f7f7f7f
> > > [    3.646661] x7 : ffff000009fe5880 x6 : 0000000000000000
> > > [    3.646668] x5 : 0000000000000000 x4 : ffff000009fe5880
> > > [    3.646674] x3 : ffff80001250d7a0 x2 : ffff000009fe5880
> > > [    3.746653] x1 : ffff00003b64f680 x0 : dead000000000100
> > > [    3.751949] Call trace:
> > > [    3.757243]  __clk_put+0x40/0x140
> > > [    3.759413]  clk_put+0x18/0x28
> > > [    3.762885]  dev_pm_opp_put_clkname+0x30/0x58
> > > [    3.765837]  sdhci_msm_probe+0x288/0x9a8
> > > [    3.770265]  platform_drv_probe+0x5c/0xb0
> > > [    3.774258]  really_probe+0xf0/0x4d8
> > > [    3.778163]  driver_probe_device+0xfc/0x168
> > > [    3.781810]  __driver_attach_async_helper+0xbc/0xc8
> > > [    3.785717]  async_run_entry_fn+0x4c/0x1b0
> > > [    3.790575]  process_one_work+0x1c8/0x498
> > > [    3.794741]  worker_thread+0x54/0x428
> > > [    3.798822]  kthread+0x120/0x158
> > > [    3.802467]  ret_from_fork+0x10/0x30
> > > [    3.805771] Code: 35000720 a9438660 f9000020 b4000040 (f9000401)
> > > [    3.809334] ---[ end trace 1a607a5ea6891b9f ]---
> > >
> > > full test log link,
> > > https://lkft.validation.linaro.org/scheduler/job/1765840#L2014
> > > https://lkft.validation.linaro.org/scheduler/job/1765842#L1960
> > >
> > > --
> > > Linaro LKFT
> > > https://lkft.linaro.org
> >
> > Naresh, thanks for reporting!
> >
> > There have been regressions related to the opp library this cycle, so
> > I am wondering if Viresh may have any ideas, before going into more
> > details.
>
> I am really pissed at this. This is the exact bug we got earlier,
> which Naresh also confirmed as being fixed after the patches I
> proposed.

No worries, let's just fix it, again. :-)

>
> > One thing that also changed from the sdhci-msm point of view, is that
> > we enabled async probe [1]. This could be the thing that triggers an
> > untested error path of the probe?
>
> Maybe, but I am not sure if this will cause such an issue. At max it
> should cause issues for other stuff that depends on sdhci.
>
> > Otherwise we can always try to revert "mmc: sdhci-msm: Unconditionally
> > call dev_pm_opp_of_remove_table()", which I recently applied again
> > after the earlier errors.
>
> Yeah, that's the easiest of all.
>
> I am trying to find someone with local 410c who can help me fix it in
> realtime. Lets see.

I have the board as well. If you need some help with testing, just let me know.

In any case, I will try the revert and see how that changes things.

Kind regards
Uffe