lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220715154203.48093-1-hsiangkao@linux.alibaba.com>
Date:   Fri, 15 Jul 2022 23:41:47 +0800
From:   Gao Xiang <hsiangkao@...ux.alibaba.com>
To:     linux-erofs@...ts.ozlabs.org, Chao Yu <chao@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Gao Xiang <hsiangkao@...ux.alibaba.com>
Subject: [PATCH v2 00/16] erofs: prepare for folios, deduplication and kill PG_error 

Hi folks,

I've been doing this for almost 2 months, the main point of this is
to support large folios and rolling hash deduplication for compressed
data.

This patchset is as a start of this work targeting for the next 5.20,
it introduces a flexable range representation for (de)compressed buffers
instead of too relying on page(s) directly themselves, so large folios
can laterly base on this work.  Also, this patchset gets rid of all
PG_error flags in the decompression code. It's a cleanup as a result
as well.

In addition, this patchset kicks off rolling hash deduplication for
compressed data by introducing fully-referenced multi-reference
pclusters first instead of reporting fs corruption if one pcluster
is introduced by several differnt extents.  The full implementation
is expected to be finished in the merge window after the next.  One
of my colleagues is actively working on the userspace part of this
feature.

However, it's still easy to verify fully-referenced multi-reference
pcluster by constructing some image by hand (see attachment):

Dataset: 300M
seq-read (data-deduplicated, read_ahead_kb 8192): 1095MiB/s
seq-read (data-deduplicated, read_ahead_kb 4096): 771MiB/s
seq-read (data-deduplicated, read_ahead_kb 512):  577MiB/s
seq-read (vanilla, read_ahead_kb 8192):           364MiB/s

Finally, this patchset survives ro-fsstress on my side.

Thanks,
Gao Xiang

Changes since v1:
 - rename left pagevec words to bvpage (Yue Hu);

Gao Xiang (16):
  erofs: get rid of unneeded `inode', `map' and `sb'
  erofs: clean up z_erofs_collector_begin()
  erofs: introduce `z_erofs_parse_out_bvecs()'
  erofs: introduce bufvec to store decompressed buffers
  erofs: drop the old pagevec approach
  erofs: introduce `z_erofs_parse_in_bvecs'
  erofs: switch compressed_pages[] to bufvec
  erofs: rework online page handling
  erofs: get rid of `enum z_erofs_page_type'
  erofs: clean up `enum z_erofs_collectmode'
  erofs: get rid of `z_pagemap_global'
  erofs: introduce struct z_erofs_decompress_backend
  erofs: try to leave (de)compressed_pages on stack if possible
  erofs: introduce z_erofs_do_decompressed_bvec()
  erofs: record the longest decompressed size in this round
  erofs: introduce multi-reference pclusters (fully-referenced)

 fs/erofs/compress.h     |   2 +-
 fs/erofs/decompressor.c |   2 +-
 fs/erofs/zdata.c        | 785 +++++++++++++++++++++++-----------------
 fs/erofs/zdata.h        | 119 +++---
 fs/erofs/zpvec.h        | 159 --------
 5 files changed, 496 insertions(+), 571 deletions(-)
 delete mode 100644 fs/erofs/zpvec.h

-- 
2.24.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ