[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ebe2146-ee1c-4325-8259-be3803475f1f@prolan.hu>
Date: Mon, 12 May 2025 15:13:20 +0200
From: Csókás Bence <csokas.bence@...lan.hu>
To: Richard Weinberger <richard@....at>
CC: Miquel Raynal <miquel.raynal@...tlin.com>, linux-mtd
<linux-mtd@...ts.infradead.org>, linux-kernel <linux-kernel@...r.kernel.org>,
Vignesh Raghavendra <vigneshr@...com>
Subject: Re: [PATCH v3] mtd: Verify written data in paranoid mode
Hi,
On 2025. 05. 12. 14:47, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Von: "Csókás Bence" <csokas.bence@...lan.hu>
>> Well, yes, in our case. But the point is, we have a strict requirement
>> for data integrity, which is not unique to us I believe. I would think
>> there are other industrial control applications like ours, which dictate
>> a high data integrity.
>
> In your last patch set you said your hardware has an issue that every
> now and that data is not properly written.
> Now you talk about data integrity requirements. I'm confused.
The two problems are not too dissimilar: in one case you have a random,
and _very_ low chance of data corruption, e.g. because of noise, aging
hardware, power supply ripple etc. But you still need to make sure that
the written data is absolutely correct; or if it is not, the system will
immediately enter some fail-safe mode. This is the problem we want to
solve, for everybody using Linux in high reliability environments.
The problem we _have_ though happens to be a bit different: here we are
blursed with a system that corrupts data at a noticeable probability.
But the model is the same: a stochastic process introducing bit errors
on write. But I sincerely hope no one else has this problem, and this is
*not* the primary aim of this patch; it just happens to solve our issue
as well. But I intend it to be useful for the larger Linux community,
thus the primary goal is to solve the first issue.
> My point is that at some level we need to trust hardware,
> if your flash memory is so broken that you can't rely on the write
> path you're in deep trouble.
Sure, but at the moment, we're not giving any return path for hardware.
We just shovel megabytes at it, and don't even ask back. In critical
systems, this will not fly.
> What is the next step, reading it back every five seconds to make
> sure it is still there? (just kidding).
(( Well, you're kidding now, but this is what we will have to do in
another project, a rail interlocking system. Though obviously not in the
kernel. But I digress... ))
Bence
Powered by blists - more mailing lists