lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 20 Feb 2022 08:03:54 +1100
From:   Dave Chinner <david@...morbit.com>
To:     Kyle Sanderson <kyle.leet@...il.com>
Cc:     qat-linux@...el.com, giovanni.cabiddu@...el.com,
        Linux-Kernal <linux-kernel@...r.kernel.org>,
        linux-xfs@...r.kernel.org, linux-crypto@...r.kernel.org,
        dm-devel@...hat.com, Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with
 dm-crypt + xfs

On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote:
> A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
> attempted to be used by xfs (through dm-crypt) the entire kernel
> thread stalls forever. Multiple users have hit this over the years
> (through sporadic reporting) - I ended up trying ZFS and encryption
> wasn't an issue there at all because I guess they don't use this
> device. Returning to sanity (xfs), I was able to provision a dm-crypt
> volume no problem on the disk, however when running mkfs.xfs on the
> volume is what triggers the cascading failure (each request kills a
> kthread).

Can you provide the full stack traces for these errors so we can see
exactly what this cascading failure looks like, please? In reality,
the stall messages some time after this are not interesting - it's
the first errors that cause the stall that need to be investigated.

A good idea would be to provide the full storage stack decription
and hardware in use, as per:

https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> Disabling IQAT on the south bridge results in a working
> system, however this is not the default configuration for the
> distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
> convinced this never worked properly based on the lack of popularity
> for kernel encryption (crypto), and the embedded nature that
> SuperMicro has integrated this device in collaboration with intel as
> it looks like the primary usage is through external accelerator cards.

This really sounds like broken hardware, not a kernel problem.

> Kernels tried were from RHEL8 over a year ago, and this impacts the
> entirety of the 5.4 series on Ubuntu.
> Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.

[snip stalled kcryptd worker threads]

This implies a dmcrypt level problem - XFS can't make progress is
dmcrypt is not completing IOs.

Where are the XFS corruption reports that the subject implies is
occurring?

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ