lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200229095550.GX25745@shell.armlinux.org.uk>
Date:   Sat, 29 Feb 2020 09:55:50 +0000
From:   Russell King - ARM Linux admin <linux@...linux.org.uk>
To:     Olof Johansson <olof@...om.net>, Jon Nettleton <jon@...id-run.com>
Cc:     "mark.rutland@....com" <mark.rutland@....com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        "arnd@...db.de" <arnd@...db.de>,
        "m.karthikeyan@...iveil.co.in" <m.karthikeyan@...iveil.co.in>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "Z.q. Hou" <zhiqiang.hou@....com>,
        "l.subrahmanya@...iveil.co.in" <l.subrahmanya@...iveil.co.in>,
        "will.deacon@....com" <will.deacon@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Leo Li <leoyang.li@....com>,
        "M.h. Lian" <minghuan.lian@....com>,
        Xiaowei Bao <xiaowei.bao@....com>,
        "catalin.marinas@....com" <catalin.marinas@....com>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "andrew.murray@....com" <andrew.murray@....com>,
        "shawnguo@...nel.org" <shawnguo@...nel.org>,
        Mingkai Hu <mingkai.hu@....com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4
 driver for NXP Layerscape SoCs

On Mon, Feb 10, 2020 at 03:22:57PM +0000, Russell King - ARM Linux admin wrote:
> On Mon, Feb 10, 2020 at 04:12:30PM +0100, Olof Johansson wrote:
> > On Thu, Feb 6, 2020 at 11:57 AM Z.q. Hou <zhiqiang.hou@....com> wrote:
> > >
> > > Hi Olof,
> > >
> > > Thanks a lot for your comments!
> > > And sorry for my delay respond!
> > 
> > Actually, they apply with only minor conflicts on top of current -next.
> > 
> > Bjorn, any chance we can get you to pick these up pretty soon? They
> > enable full use of a promising ARM developer system, the SolidRun
> > HoneyComb, and would be quite valuable for me and others to be able to
> > use with mainline or -next without any additional patches applied --
> > which this patchset achieves.
> > 
> > I know there are pending revisions based on feedback. I'll leave it up
> > to you and others to determine if that can be done with incremental
> > patches on top, or if it should be fixed before the initial patchset
> > is applied. But all in all, it's holding up adaption by me and surely
> > others of a very interesting platform -- I'm looking to replace my
> > aging MacchiatoBin with one of these and would need PCIe/NVMe to work
> > before I do.
> 
> If you're going to be using NVMe, make sure you use a power-fail safe
> version; I've already had one instance where ext4 failed to mount
> because of a corrupted journal using an XPG SX8200 after the Honeycomb
> Serror'd, and then I powered it down after a few hours before later
> booting it back up.
> 
> EXT4-fs (nvme0n1p2): INFO: recovery required on readonly filesystem
> EXT4-fs (nvme0n1p2): write access will be enabled during recovery
> JBD2: journal transaction 80849 on nvme0n1p2-8 is corrupt.
> EXT4-fs (nvme0n1p2): error loading journal

... and last night, I just got more ext4fs errors on the NVMe, without
any unclean power cycles:

[73729.556544] EXT4-fs error (device nvme0n1p2): ext4_lookup:1700: inode #917524: comm rm: iget: checksum invalid
[73729.565354] Aborting journal on device nvme0n1p2-8.
[73729.568995] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
[73729.569077] EXT4-fs error (device nvme0n1p2): ext4_journal_check_start:61: Detected aborted journal
[73729.573741] EXT4-fs error (device nvme0n1p2): ext4_lookup:1700: inode #917524: comm rm: iget: checksum invalid
[73729.593330] EXT4-fs error (device nvme0n1p2): ext4_lookup:1700: inode #917524: comm mv: iget: checksum invalid

The affected file is /var/backups/dpkg.status.6.gz

It was cleanly shut down and powered off on the 22nd February, booted
yesterday morning followed by another reboot a few minutes later.

What worries me is the fact that corruption has happened - and if that
happens to a file rather than an inode, it will likely go unnoticed
for a considerably longer time.

I think I'm getting to the point of deciding NVMe or the LX2160A to be
just too unreliable for serious use.  I hadn't noticed any issues when
using the rootfs on the eMMC, so it suggests either the NVMe is
unreliable, or there's a problem with PCIe on this platform (which we
kind of know about with Jon's GPU rendering issues.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ