lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190625094602.GC13263@fuggles.cambridge.arm.com>
Date:   Tue, 25 Jun 2019 10:46:02 +0100
From:   Will Deacon <will.deacon@....com>
To:     Will Deacon <will@...nel.org>
Cc:     Vicente Bergas <vicencb@...il.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Catalin Marinas <catalin.marinas@....com>, marc.zyngier@....com
Subject: Re: d_lookup: Unable to handle kernel paging request

[+Marc]

Hi again, Vicente,

On Mon, Jun 24, 2019 at 12:47:41PM +0100, Will Deacon wrote:
> On Sat, Jun 22, 2019 at 08:02:19PM +0200, Vicente Bergas wrote:
> > Hi Al,
> > i think have a hint of what is going on.
> > With the last kernel built with your sentinels at hlist_bl_*lock
> > it is very easy to reproduce the issue.
> > In fact it is so unstable that i had to connect a serial port
> > in order to save the kernel trace.
> > Unfortunately all the traces are at different addresses and
> > your sentinel did not trigger.
> > 
> > Now i am writing this email from that same buggy kernel, which is
> > v5.2-rc5-224-gbed3c0d84e7e.
> > 
> > The difference is that I changed the bootloader.
> > Before was booting 5.1.12 and kexec into this one.
> > Now booting from u-boot into this one.
> > I will continue booting with u-boot for some time to be sure it is
> > stable and confirm this is the cause.
> > 
> > In case it is, who is the most probable offender?
> > the kernel before kexec or the kernel after?
> 
> Has kexec ever worked reliably on this board? If you used to kexec
> successfully, then we can try to hunt down the regression using memtest.
> If you kexec into a problematic kernel with CONFIG_MEMTEST=y and pass
> "memtest=17" on the command-line, it will hopefully reveal any active
> memory corruption.
> 
> My first thought is that there is ongoing DMA which corrupts the dentry
> hash. The rk3399 SoC also has an IOMMU, which could contribute to the fun
> if it's not shutdown correctly (i.e. if it enters bypass mode).
> 
> > The original report was sent to you because you appeared as the maintainer
> > of fs/dcache.c, which appeared on the trace. Should this be redirected
> > somewhere else now?
> 
> linux-arm-kernel@...ts.infradead.org
> 
> Probably worth adding Heiko Stuebner <heiko@...ech.de> to cc.

Before you rush over to LAKML, please could you provide your full dmesg
output from the kernel that was crashing (i.e. the dmesg you see in the
kexec'd kernel)? We've got a theory that the issue may be related to the
interrupt controller, and the dmesg output should help to establish whether
that is plausible or not.

Thanks,

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ