[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221114140903.GF30263@willie-the-truck>
Date: Mon, 14 Nov 2022 14:09:04 +0000
From: Will Deacon <will@...nel.org>
To: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
Cc: Catalin Marinas <catalin.marinas@....com>,
Amit Pundir <amit.pundir@...aro.org>,
Robin Murphy <robin.murphy@....com>,
Bjorn Andersson <andersson@...nel.org>,
Sibi Sankar <quic_sibis@...cinc.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
Subject: Re: [GIT PULL] arm64 updates for 6.1-rc1
On Sat, Nov 12, 2022 at 12:48:20AM +0530, Manivannan Sadhasivam wrote:
> On Fri, Nov 11, 2022 at 11:10:01PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Nov 11, 2022 at 11:15:11AM +0000, Catalin Marinas wrote:
> > > On Tue, Nov 08, 2022 at 10:58:16PM +0530, Amit Pundir wrote:
> > > > On Tue, 25 Oct 2022 at 18:08, Amit Pundir <amit.pundir@...aro.org> wrote:
> > > > > On Wed, 12 Oct 2022 at 17:24, Catalin Marinas <catalin.marinas@....com> wrote:
> > > > > > On Sat, Oct 08, 2022 at 08:28:26PM +0530, Amit Pundir wrote:
> > > > > > > On Wed, 5 Oct 2022 at 20:11, Catalin Marinas <catalin.marinas@....com> wrote:
> > > > > > > > Will Deacon (2):
> > > > > > > > arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
> > > > > > >
> > > > > > > This patch broke AOSP on Dragonboard 845c (SDM845). I don't see any
> > > > > > > relevant crash in the attached log and device silently reboots into
> > > > > > > USB crash dump mode. The crash is fairly reproducible on db845c. I
> > > > > > > could trigger it twice in 5 reboots and it always crash at the same
> > > > > > > point during the boot process. Reverting this patch fixes the crash.
> > > > > > >
> > > > > > > I'm happy to test run any debug patche(s), that would help narrow
> > > > > > > down this breakage.
> > > [...]
> > > > > Further narrowed down the breakage to the userspace daemon rmtfs
> > > > > https://github.com/andersson/rmtfs. Is there anything specific in the
> > > > > userspace code that I should be paying attention to?
> > >
> > > Since you don't see anything in the logs like a crash and the system
> > > restarts, I suspect it's some deadlock and that's triggering the
> > > watchdog. We have an erratum (826319) but that's for Cortex-A53. IIUC
> > > SDM845 has Kryo 3xx series which based on some random google searches is
> > > derived from A75/A55. Unfortunately the MIDR_EL1 register doesn't match
> > > the Arm Ltd numbering, so I have no idea what CPUs these are by looking
> > > at the boot log.
> > >
> > > I wouldn't be surprised if you hit a similar bug, though I couldn't find
> > > anything close in the A55 errata notice.
> > >
> > > While we could revert commit c44094eee32f ("arm64: dma: Drop cache
> > > invalidation from arch_dma_prep_coherent()"), if you hit a real hardware
> > > issue it may trigger in other scenario where we only do cache cleaning
> > > (without invalidate), like arch_sync_dma_for_device(). So I'd rather get
> > > to the bottom of this and potentially enable the workaround for this
> > > chipset.
> > >
> > > You could give it a quick try to by adding the MIDR ranges for SDM845 to
> > > struct midr_range workaround_clean_cache[].
> > >
> >
> > I gave it a shot and indeed it fixes the crash on DB845.
> >
> > > After that I suggest you raise it with Qualcomm to investigate. Normally
> > > we ask for an erratum number to enable a workaround and it's only
> > > Qualcomm that can provide one here.
> > >
> >
> > I will check with Qualcomm folks and update.
> >
>
> I digged a little further and found that the crash was due to the secure
> processor (XPU) violation. It happens because, CPU tried acccessing the memory
> after sharing it with the modem for firmware metadata validation.
Can you share more details about this violation, please? For example, is it
s read or a write, what size is it, how is it detected?
> Sibi tried fixing this problem earlier by using a hack in the remoteproc driver
> [1], but I guess that got negated due to c44094eee32f?
Performing a clean rather than a clean+invalidate when the buffer is
allocated (which is what is achieved by c44094eee32f) shouldn't affect
this afaict.
> This is a common issue for other Qcom remoteproc drivers as well where CPU
> shares a chunk of memory with the modem. There is one more hack in place where
> the a chunk of memory is reserved and the driver will do memremap/copy the
> data/memunmap using it and share it with modem.
>
> But is there a better solution overall that you could advise?
I think we need a better understanding of what Qualcomm's SCM firmware is
expecting about the state of the buffer pages being shared with the modem
before we can suggest other solutions.
Will
Powered by blists - more mailing lists