linux-kernel - Re: [RFC/PATCH 1/5 v2] mtd: ubi: Read disturb infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1415197672.958.96.camel@sauron.fi.intel.com>
Date:	Wed, 05 Nov 2014 16:27:52 +0200
From:	Artem Bityutskiy <dedekind1@...il.com>
To:	Tanya Brokhman <tlinder@...eaurora.org>
Cc:	richard@....at, linux-mtd@...ts.infradead.org,
	linux-arm-msm@...r.kernel.org,
	Randy Dunlap <rdunlap@...radead.org>,
	David Woodhouse <dwmw2@...radead.org>,
	Brian Norris <computersforpeace@...il.com>,
	"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC/PATCH 1/5 v2] mtd: ubi: Read disturb infrastructure

Hi,

let me summarize.

1. To handle the read disturb problem you selected the read counters
solution. PEB (physical erase block) read counter is the trigger for
scrubbing.

When the PEB read counter reaches a threshold value - we scrub.
The threshold value is set to 100000 by default. Users may it via sysfs.

Read-counters are stored on the flash media, in the fastmap structure.

2. To handle the data retention problem you selected the time-stamps
approach. Each PEB gets a time-stamp. Time-stamp gets updated when the
PEB is erased.

When PEB becomes old enough (by comparing to the current system time),
it gets scrubbed.

The threshold for "old enough" is 120 days by default. Users may change
it via sysfs.

Timestamps are stored on the flash media, in the erase counter header of
the PEB.

For both, the read-disturb and data retention problems to be taken care
of, user-space has to periodically trigger scanning, which will go
through all PEBs, check the read counter and the timestamp, and scrub if
needed.

Hopefully I got right. I have concerns.

This is rather complex design, and it is not clear why just doing full
flash read from time to time is not good enough. Why the complexity is
worth it.

In this case the trigger for scrubbing is a bit-flip. It indicates that
there is a problem, and this is a reliable trigger.

Is the 100000 reads threshold a reliable trigger? What if it is 10 times
larger for me, or smaller?

Is the 120 days threshold a reliable trigger? What if it is much larger
for me?

Do you think it is even possible to get the thresholds right?

Let me re-emphasize: bit-flip is an objective, reliable reason to scrub.
Thresholds are more of a pessimistic prediction. No?

Let's think of someone having a R/O storage with video files. Compare:

1. Do full media read every 120 days in background with low priority
(possibly implemented as UBI service). And because it is R/O, cut the
power all you want.

2. Have UBI write data to media all the time to maintain the read
counters. Lose your fastmap and have slow mounts on power cuts. Have
extra work on boot-time. Spend more RAM.

Why 2 is better than 1?

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/