[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240209192542.449367-1-simone.weiss@elektrobit.com>
Date: Fri, 9 Feb 2024 20:25:41 +0100
From: Simone Weiß <simone.weiss@...ktrobit.com>
To:
CC: <lukas.bulwahn@...il.com>, <simone.p.weiss@...teo.net>,
Simone Weiß <simone.weiss@...ktrobit.com>, Kai Tomerius
<kai.tomerius@...ktrobit.com>, Alasdair Kergon <agk@...hat.com>, Mike Snitzer
<snitzer@...nel.org>, Mikulas Patocka <mpatocka@...hat.com>,
<dm-devel@...ts.linux.dev>, Song Liu <song@...nel.org>, Yu Kuai
<yukuai3@...wei.com>, <linux-raid@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: [PATCH] [RFQ] dm-integrity: Add a lazy commit mode for journal
Extend the dm-integrity driver to omit writing unused journal data sectors.
Instead of filling up the whole journal section, mark the last used
sector with a special commit ID. The commit ID still uses the same base value,
but section number and sector number are inverted. At replay when commit IDs
are analyzed this special commit ID is detected as end of valid data for this
section. The main goal is to prolong the live times of e.g. eMMCs by avoiding
to write the whole journal data sectors.
The change is right now to be seen as experimental and gets applied if
CONFIG_DMINT_LAZY_COMMIT is set to y. Note please that this is NOT
planned for a final version of the changes. I would make it configurable
via flags passed e.g. via dmsetup and stored in the superblock.
Architectural Limitations:
- A dm-integrity partition, that was previously used with lazy commit,
can't be replayed with a dm-integrity driver not using lazy commit.
- A dm-integrity driver that uses lazy commit is expected
to be able to cope with a partition that was created and used without
lazy commit.
- With dm-integrity lazy commit, a partially written journal (e.g. due to a
power cut) can cause a tag mismatch during replay if the journal entry marking
the end of the journal section is missing. Due to lazy commit, older journal
entries are not erased and might be processed if they have the same commit ID
as adjacent newer journal entries. If dm-integrity detects bad sections while
replaying the journal, keep track about those sections and try to at least
replay older, good sections.
This is based on the assumption that most likely the newest
section(s) will be damaged, which might have been only partially written
due to a sudden reset. Previously, the whole journal would be cleared in
such a case.
Signed-off-by: Simone Weiß <simone.weiss@...ktrobit.com>
Signed-off-by: Kai Tomerius <kai.tomerius@...ktrobit.com>
---
This is just a very initial version. Bear that in mind please. I would like to
get feedback about the general idea and am aware that further work is needed.
Tests done so far:
- Tests where executed on qemu.
- Test scripts can be found under:
git@...hub.com:simone-weiss/dm-integrity-lazy-commit.git
- Suggestions on how to test this further, what testscases to run this against
are appreciated.
Further work:
- The superblock should carry information about lazy-commit. Should the
version be increased for this?
- Add handling/logging if a partition that was created with lazy commits,
but gets replayed with a "normal" journal mode.
- Allow configuration if you want to use lazy commits or normal commits in the
journal if lazy commits are enabled
- userspace setup tooling like dmsetup should be adapted accordingly
drivers/md/Kconfig | 10 ++
drivers/md/dm-integrity.c | 250 ++++++++++++++++++++++++++++++++------
2 files changed, 222 insertions(+), 38 deletions(-)
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 68ce56fc61d0..d28a65dd54ad 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -604,6 +604,16 @@ config DM_INTEGRITY
To compile this code as a module, choose M here: the module will
be called dm-integrity.
+config DMINT_LAZY_COMMIT
+ tristate "Lazy commit for dm-integrity target"
+ depends on DM_INTEGRITY
+ default n
+ help
+ Extend the dm-integrity driver to omit writing unused journal data.
+ Instead use a special lazy commit id that marks the end of the data
+ in the journal.
+ To be seen as experimental.
+
config DM_ZONED
tristate "Drive-managed zoned block device target support"
depends on BLK_DEV_DM
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index ed45411eb68d..d521b5d4d2d5 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -1083,18 +1083,19 @@ static void rw_journal_sectors(struct dm_integrity_c *ic, blk_opf_t opf,
}
static void rw_journal(struct dm_integrity_c *ic, blk_opf_t opf,
- unsigned int section, unsigned int n_sections,
- struct journal_completion *comp)
+ unsigned int section, unsigned int n_sections,
+ unsigned int omit_sectors, struct journal_completion *comp)
{
unsigned int sector, n_sectors;
sector = section * ic->journal_section_sectors;
- n_sectors = n_sections * ic->journal_section_sectors;
+ n_sectors = n_sections * ic->journal_section_sectors - omit_sectors;
rw_journal_sectors(ic, opf, sector, n_sectors, comp);
}
-static void write_journal(struct dm_integrity_c *ic, unsigned int commit_start, unsigned int commit_sections)
+static void write_journal(struct dm_integrity_c *ic, unsigned int commit_start,
+ unsigned int commit_sections, unsigned int omit_sectors)
{
struct journal_completion io_comp;
struct journal_completion crypt_comp_1;
@@ -1117,7 +1118,7 @@ static void write_journal(struct dm_integrity_c *ic, unsigned int commit_start,
rw_section_mac(ic, commit_start + i, true);
}
rw_journal(ic, REQ_OP_WRITE | REQ_FUA | REQ_SYNC, commit_start,
- commit_sections, &io_comp);
+ commit_sections, omit_sectors, &io_comp);
} else {
unsigned int to_end;
@@ -1130,7 +1131,7 @@ static void write_journal(struct dm_integrity_c *ic, unsigned int commit_start,
encrypt_journal(ic, true, commit_start, to_end, &crypt_comp_1);
if (try_wait_for_completion(&crypt_comp_1.comp)) {
rw_journal(ic, REQ_OP_WRITE | REQ_FUA,
- commit_start, to_end, &io_comp);
+ commit_start, to_end, 0, &io_comp);
reinit_completion(&crypt_comp_1.comp);
crypt_comp_1.in_flight = (atomic_t)ATOMIC_INIT(0);
encrypt_journal(ic, true, 0, commit_sections - to_end, &crypt_comp_1);
@@ -1141,17 +1142,19 @@ static void write_journal(struct dm_integrity_c *ic, unsigned int commit_start,
crypt_comp_2.in_flight = (atomic_t)ATOMIC_INIT(0);
encrypt_journal(ic, true, 0, commit_sections - to_end, &crypt_comp_2);
wait_for_completion_io(&crypt_comp_1.comp);
- rw_journal(ic, REQ_OP_WRITE | REQ_FUA, commit_start, to_end, &io_comp);
+ rw_journal(ic, REQ_OP_WRITE | REQ_FUA, commit_start, to_end, 0,
+ &io_comp);
wait_for_completion_io(&crypt_comp_2.comp);
}
} else {
for (i = 0; i < to_end; i++)
rw_section_mac(ic, commit_start + i, true);
- rw_journal(ic, REQ_OP_WRITE | REQ_FUA, commit_start, to_end, &io_comp);
+ rw_journal(ic, REQ_OP_WRITE | REQ_FUA, commit_start, to_end, 0, &io_comp);
for (i = 0; i < commit_sections - to_end; i++)
rw_section_mac(ic, i, true);
}
- rw_journal(ic, REQ_OP_WRITE | REQ_FUA, 0, commit_sections - to_end, &io_comp);
+ rw_journal(ic, REQ_OP_WRITE | REQ_FUA, 0, commit_sections - to_end,
+ omit_sectors, &io_comp);
}
wait_for_completion_io(&io_comp.comp);
@@ -1777,7 +1780,6 @@ static void integrity_metadata(struct work_struct *w)
if (unlikely(r)) {
if (r > 0) {
sector_t s;
-
s = sector - ((r + ic->tag_size - 1) / ic->tag_size);
DMERR_LIMIT("%pg: Checksum failed at sector 0x%llx",
bio->bi_bdev, s);
@@ -2355,6 +2357,9 @@ static void integrity_commit(struct work_struct *w)
unsigned int commit_start, commit_sections;
unsigned int i, j, n;
struct bio *flushes;
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+ unsigned int used_sectors;
+#endif
del_timer(&ic->autocommit_timer);
@@ -2366,6 +2371,15 @@ static void integrity_commit(struct work_struct *w)
goto release_flush_bios;
}
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+ if (ic->free_section_entry)
+ used_sectors = (ic->free_section_entry <<
+ ic->sb->log2_sectors_per_block) +
+ JOURNAL_BLOCK_SECTORS;
+ else
+ used_sectors = ic->journal_section_sectors;
+#endif
+
pad_uncommitted(ic);
commit_start = ic->uncommitted_section;
commit_sections = ic->n_uncommitted_sections;
@@ -2388,6 +2402,16 @@ static void integrity_commit(struct work_struct *w)
struct journal_sector *js;
js = access_journal(ic, i, j);
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+ if (n == commit_sections-1 && j == used_sectors-1) {
+ js->commit_id = dm_integrity_commit_id(ic, ~i,
+ ~j, ic->commit_seq);
+ DEBUG_print("Lazy commit id=0x%llx: Sections %u.%u. Last section with %u sectors\n",
+ js->commit_id, commit_start, i,
+ used_sectors);
+ break;
+ }
+#endif
js->commit_id = dm_integrity_commit_id(ic, i, j, ic->commit_seq);
}
i++;
@@ -2397,7 +2421,12 @@ static void integrity_commit(struct work_struct *w)
}
smp_rmb();
- write_journal(ic, commit_start, commit_sections);
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+ write_journal(ic, commit_start, commit_sections,
+ ic->journal_section_sectors-used_sectors);
+#else
+ write_journal(ic, commit_start, commit_sections, 0);
+#endif
spin_lock_irq(&ic->endio_wait.lock);
ic->uncommitted_section += commit_sections;
@@ -2443,12 +2472,13 @@ static void restore_last_bytes(struct dm_integrity_c *ic, struct journal_sector
} while (++s < ic->sectors_per_block);
}
-static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start,
- unsigned int write_sections, bool from_replay)
+static int do_journal_write(struct dm_integrity_c *ic, unsigned int write_start,
+ unsigned int write_sections, bool from_replay)
{
unsigned int i, j, n;
struct journal_completion comp;
struct blk_plug plug;
+ unsigned int rc = 0;
blk_start_plug(&plug);
@@ -2465,7 +2495,7 @@ static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start
for (j = 0; j < ic->journal_section_entries; j++) {
struct journal_entry *je = access_journal_entry(ic, i, j);
sector_t sec, area, offset;
- unsigned int k, l, next_loop;
+ unsigned int k, l, next_loop, end;
sector_t metadata_block;
unsigned int metadata_offset;
struct journal_io *io;
@@ -2543,6 +2573,7 @@ static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start
spin_unlock_irq(&ic->endio_wait.lock);
metadata_block = get_metadata_sector_and_offset(ic, area, offset, &metadata_offset);
+ end = k;
for (l = j; l < k; l++) {
int r;
struct journal_entry *je2 = access_journal_entry(ic, i, l);
@@ -2557,8 +2588,24 @@ static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start
integrity_sector_checksum(ic, sec + ((l - j) << ic->sb->log2_sectors_per_block),
(char *)access_journal_data(ic, i, l), test_tag);
if (unlikely(memcmp(test_tag, journal_entry_tag(ic, je2), ic->tag_size))) {
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+ if (!from_replay)
+ dm_integrity_io_error(ic, "tag mismatch when writing journal",
+ -EILSEQ);
+
+ /*
+ * during replay, continue processing and discard
+ * data with a tag mismatch
+ */
+ rc = -1;
+ if (end > l)
+ end = l;
+
+ DEBUG_print("tag mismatch at section %u entry %u\n", n, l);
+#else
dm_integrity_io_error(ic, "tag mismatch when replaying journal", -EILSEQ);
dm_audit_log_target(DM_MSG_PREFIX, "integrity-replay-journal", ic->ti, 0);
+#endif
}
}
@@ -2569,11 +2616,15 @@ static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start
dm_integrity_io_error(ic, "reading tags", r);
}
- atomic_inc(&comp.in_flight);
- copy_from_journal(ic, i, j << ic->sb->log2_sectors_per_block,
- (k - j) << ic->sb->log2_sectors_per_block,
- get_data_sector(ic, area, offset),
- complete_copy_from_journal, io);
+ // copy data that has not been discarded
+ if (end > j) {
+ atomic_inc(&comp.in_flight);
+ copy_from_journal(ic, i, j << ic->sb->log2_sectors_per_block,
+ (end - j) << ic->sb->log2_sectors_per_block,
+ get_data_sector(ic, area, offset),
+ complete_copy_from_journal, io);
+ }
+
skip_io:
j = next_loop;
}
@@ -2587,6 +2638,8 @@ static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start
wait_for_completion_io(&comp.comp);
dm_integrity_flush_buffers(ic, true);
+
+ return rc;
}
static void integrity_writer(struct work_struct *w)
@@ -2603,7 +2656,8 @@ static void integrity_writer(struct work_struct *w)
if (!write_sections)
return;
- do_journal_write(ic, write_start, write_sections, false);
+ if (do_journal_write(ic, write_start, write_sections, false) < 0)
+ write_sections = ~0;
spin_lock_irq(&ic->endio_wait.lock);
@@ -2914,7 +2968,7 @@ static void init_journal(struct dm_integrity_c *ic, unsigned int start_section,
}
}
- write_journal(ic, start_section, n_sections);
+ write_journal(ic, start_section, n_sections, 0);
}
static int find_commit_seq(struct dm_integrity_c *ic, unsigned int i, unsigned int j, commit_id_t id)
@@ -2929,6 +2983,50 @@ static int find_commit_seq(struct dm_integrity_c *ic, unsigned int i, unsigned i
return -EIO;
}
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+static int find_commit_seq_lazy(struct dm_integrity_c *ic, unsigned int i,
+ unsigned int j, commit_id_t id, bool *lazy)
+{
+ unsigned char k;
+ *lazy = false;
+ for (k = 0; k < N_COMMIT_IDS; k++) {
+ if (dm_integrity_commit_id(ic, i, j, k) == id)
+ return k;
+ }
+ for (k = 0; k < N_COMMIT_IDS; k++) {
+ if (dm_integrity_commit_id(ic, ~i, ~j, k) == id) {
+ DEBUG_print("Found a lazy commit id at %d:%d\n", i, j);
+ *lazy = true;
+ return k;
+ }
+ }
+ dm_integrity_io_error(ic, "journal commit id", -EIO);
+ return -EIO;
+}
+
+static bool journal_check_lazy_commit(struct dm_integrity_c *ic,
+ unsigned int i, unsigned int sector)
+{
+ unsigned int j;
+
+ if (sector%ic->sectors_per_block) {
+ DEBUG_print("The lazy commit id is not aligned to the block size. Not replaying section\n");
+ return false;
+ }
+
+ for (j = sector>>ic->sb->log2_sectors_per_block;
+ j < ic->journal_section_entries; j++) {
+ struct journal_entry *je = access_journal_entry(ic, i, j);
+
+ if (!journal_entry_is_unused(je)) {
+ DEBUG_print("Found used journal entry after lazy commit. Not replaying section\n");
+ return false;
+ }
+ }
+ return true;
+}
+#endif
+
static void replay_journal(struct dm_integrity_c *ic)
{
unsigned int i, j;
@@ -2938,6 +3036,7 @@ static void replay_journal(struct dm_integrity_c *ic)
unsigned int continue_section;
bool journal_empty;
unsigned char unused, last_used, want_commit_seq;
+ unsigned int first_bad, last_bad, dead;
if (ic->mode == 'R')
return;
@@ -2947,10 +3046,13 @@ static void replay_journal(struct dm_integrity_c *ic)
last_used = 0;
write_start = 0;
+ first_bad = 0;
+ last_bad = 0;
+ dead = 0;
if (!ic->just_formatted) {
DEBUG_print("reading journal\n");
- rw_journal(ic, REQ_OP_READ, 0, ic->journal_sections, NULL);
+ rw_journal(ic, REQ_OP_READ, 0, ic->journal_sections, 0, NULL);
if (ic->journal_io)
DEBUG_bytes(lowmem_page_address(ic->journal_io[0].page), 64, "read journal");
if (ic->journal_io) {
@@ -2972,17 +3074,32 @@ static void replay_journal(struct dm_integrity_c *ic)
memset(used_commit_ids, 0, sizeof(used_commit_ids));
memset(max_commit_id_sections, 0, sizeof(max_commit_id_sections));
for (i = 0; i < ic->journal_sections; i++) {
+ bool bad = false;
for (j = 0; j < ic->journal_section_sectors; j++) {
int k;
struct journal_sector *js = access_journal(ic, i, j);
-
+#ifndef CONFIG_DMINT_LAZY_COMMIT
k = find_commit_seq(ic, i, j, js->commit_id);
- if (k < 0)
- goto clear_journal;
+#else
+ bool lazy;
+
+ k = find_commit_seq_lazy(ic, i, j, js->commit_id,
+ &lazy);
+ if (lazy)
+ j = ic->journal_section_sectors;
+#endif
+ if (k < 0) {
+ /* remember the first and last bad section */
+ bad = true;
+ if (!first_bad)
+ first_bad = i + 1;
+ last_bad = i + 1;
+ break;
+ }
used_commit_ids[k] = true;
max_commit_id_sections[k] = i;
}
- if (journal_empty) {
+ if (!bad && journal_empty) {
for (j = 0; j < ic->journal_section_entries; j++) {
struct journal_entry *je = access_journal_entry(ic, i, j);
@@ -3022,21 +3139,75 @@ static void replay_journal(struct dm_integrity_c *ic)
want_commit_seq = next_commit_seq(want_commit_seq);
wraparound_section(ic, &write_start);
+ if (unlikely(first_bad)) {
+ DEBUG_print("dm-integrity: write_start=%u first_bad=%u last_bad=%u\n",
+ write_start, first_bad, last_bad);
+
+ if (last_bad <= write_start)
+ /*
+ * section 0 1 2 3 | 4 5 6 7
+ * id 2 2 2 2 | 2 1 1 1
+ * first_bad=3 ^
+ * last_bad=4 ^
+ * start=4 ^
+ * dead=2 X X
+ */
+ dead = write_start - first_bad + 1;
+ else if (first_bad > write_start)
+ /*
+ * section 0 1 2 3 | 4 5 6 7
+ * id 2 2 2 2 | 2 1 1 1
+ * first_bad=7 ^
+ * last_bad=8 ^
+ * start=4 ^
+ * dead=6 X X X X X X
+ */
+ dead = ic->journal_sections + write_start - first_bad + 1;
+ else
+ /*
+ * section 0 1 2 3 | 4 5 6 7
+ * id 2 2 2 2 | 2 1 1 1
+ * first_bad=4 ^
+ * last_bad=7 ^
+ * start=4 ^
+ * dead=0 X X X X X X X X
+ */
+ dead = 0;
+
+ DEBUG_print("dm-integrity: sections=%u, empty=%s, dead=%u\n",
+ ic->journal_sections, journal_empty ? "true" : "false", dead);
+
+ if (journal_empty || dead == 0)
+ goto clear_journal;
+ }
+
i = write_start;
- for (write_sections = 0; write_sections < ic->journal_sections; write_sections++) {
+ for (write_sections = 0; write_sections < ic->journal_sections - dead;
+ write_sections++) {
for (j = 0; j < ic->journal_section_sectors; j++) {
struct journal_sector *js = access_journal(ic, i, j);
-
- if (js->commit_id != dm_integrity_commit_id(ic, i, j, want_commit_seq)) {
- /*
- * This could be caused by crash during writing.
- * We won't replay the inconsistent part of the
- * journal.
- */
- DEBUG_print("commit id mismatch at position (%u, %u): %d != %d\n",
- i, j, find_commit_seq(ic, i, j, js->commit_id), want_commit_seq);
- goto brk;
+ if (js->commit_id == dm_integrity_commit_id(ic, i, j,
+ want_commit_seq))
+ continue; /* regular commit */
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+ if (js->commit_id == dm_integrity_commit_id(ic, ~i, ~j,
+ want_commit_seq)) {
+ /* Lazy commit */
+ DEBUG_print("Found lazy commit in replay: %u, %u\n",
+ i, j);
+ if (journal_check_lazy_commit(ic, i, j + 1))
+ break;
}
+#endif
+ /*
+ * This could be caused by crash during writing.
+ * We won't replay the inconsistent part of the
+ * journal.
+ */
+ DEBUG_print("commit id mismatch at position (%u, %u): %d != %d\n",
+ i, j, find_commit_seq(ic, i, j,
+ js->commit_id), want_commit_seq);
+ goto brk;
}
i++;
if (unlikely(i >= ic->journal_sections))
@@ -3785,7 +3956,10 @@ static int create_journal(struct dm_integrity_c *ic, char **error)
if (ic->journal_crypt_alg.alg_string) {
unsigned int ivsize, blocksize;
struct journal_completion comp;
-
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+ *error = "Lazy commit with journal encryption is currently not supported";
+ goto bad;
+#endif
comp.ic = ic;
ic->journal_crypt = crypto_alloc_skcipher(ic->journal_crypt_alg.alg_string, 0, CRYPTO_ALG_ALLOCATES_MEMORY);
if (IS_ERR(ic->journal_crypt)) {
--
2.34.1
Powered by blists - more mailing lists