linux-kernel - Re: [PATCH v3] mtd: Verify written data in paranoid mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4ebe2146-ee1c-4325-8259-be3803475f1f@prolan.hu>
Date: Mon, 12 May 2025 15:13:20 +0200
From: Csókás Bence <csokas.bence@...lan.hu>
To: Richard Weinberger <richard@....at>
CC: Miquel Raynal <miquel.raynal@...tlin.com>, linux-mtd
	<linux-mtd@...ts.infradead.org>, linux-kernel <linux-kernel@...r.kernel.org>,
	Vignesh Raghavendra <vigneshr@...com>
Subject: Re: [PATCH v3] mtd: Verify written data in paranoid mode

Hi,

On 2025. 05. 12. 14:47, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Von: "Csókás Bence" <csokas.bence@...lan.hu>
>> Well, yes, in our case. But the point is, we have a strict requirement
>> for data integrity, which is not unique to us I believe. I would think
>> there are other industrial control applications like ours, which dictate
>> a high data integrity.
> 
> In your last patch set you said your hardware has an issue that every
> now and that data is not properly written.
> Now you talk about data integrity requirements. I'm confused.

The two problems are not too dissimilar: in one case you have a random, 
and _very_ low chance of data corruption, e.g. because of noise, aging 
hardware, power supply ripple etc. But you still need to make sure that 
the written data is absolutely correct; or if it is not, the system will 
immediately enter some fail-safe mode. This is the problem we want to 
solve, for everybody using Linux in high reliability environments.

The problem we _have_ though happens to be a bit different: here we are 
blursed with a system that corrupts data at a noticeable probability. 
But the model is the same: a stochastic process introducing bit errors 
on write. But I sincerely hope no one else has this problem, and this is 
*not* the primary aim of this patch; it just happens to solve our issue 
as well. But I intend it to be useful for the larger Linux community, 
thus the primary goal is to solve the first issue.

> My point is that at some level we need to trust hardware,
> if your flash memory is so broken that you can't rely on the write
> path you're in deep trouble.

Sure, but at the moment, we're not giving any return path for hardware. 
We just shovel megabytes at it, and don't even ask back. In critical 
systems, this will not fly.

> What is the next step, reading it back every five seconds to make
> sure it is still there? (just kidding).

(( Well, you're kidding now, but this is what we will have to do in 
another project, a rail interlocking system. Though obviously not in the 
kernel. But I digress... ))

Bence