linux-kernel - Re: [PATCH 06/12] iomap: Introduce read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <381c349d-2eb7-419f-a2f8-a41ca6a9e9f0@linux.alibaba.com>
Date: Fri, 11 Oct 2024 11:28:42 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Dave Chinner <david@...morbit.com>
Cc: "Darrick J. Wong" <djwong@...nel.org>,
 Matthew Wilcox <willy@...radead.org>, linux-kernel@...r.kernel.org,
 linux-fsdevel@...r.kernel.org, Goldwyn Rodrigues <rgoldwyn@...e.de>
Subject: Re: [PATCH 06/12] iomap: Introduce read_inline() function hook

Hi Dave,

On 2024/10/11 08:43, Dave Chinner wrote:
> On Thu, Oct 10, 2024 at 02:10:25PM -0400, Goldwyn Rodrigues wrote:

...

> 
> .... there is specific ordering needed.
> 
> For writes, the ordering is:
> 
> 	1. pre-write data compression - requires data copy
> 	2. pre-write data encryption - requires data copy
> 	3. pre-write data checksums - data read only
> 	4. write the data
> 	5. post-write metadata updates
> 
> We cannot usefully perform compression after encryption -
> random data doesn't compress - and the checksum must match what is
> written to disk, so it has to come after all other transformations
> have been done.
> 
> For reads, the order is:
> 
> 	1. read the data
> 	2. verify the data checksum
> 	3. decrypt the data - requires data copy
> 	4. decompress the data - requires data copy
> 	5. place the plain text data in the page cache

Just random stuffs for for reference, currently fsverity makes
markle tree for the plain text, but from the on-disk data
security/authentication and integrity perspective, I guess the
order you mentioned sounds more saner to me, or:

  	1. read the data
  	2. pre-verify the encoded data checksum (optional,
                                        dm-verity likewise way)
  	3. decrypt the data - requires data copy
  	4. decompress the data - requires data copy
	5. post-verify the decoded (plain) checksum (optional)
  	6. place the plain text data in the page cache

2,5 may apply to different use cases though.

> 

...

> 
> Compression is where using xattrs gets interesting - the xattrs can
> have a fixed "offset" they blong to, but can store variable sized
> data records for that offset.
> 
> If we say we have a 64kB compression block size, we can store the
> compressed data for a 64k block entirely in a remote xattr even if
> compression fails (i.e. we can store the raw data, not the expanded
> "compressed" data). The remote xattr can store any amount of smaller
> data, and we map the compressed data directly into the page cache at
> a high offset. Then decompression can run on the high offset pages
> with the destination being some other page cache offset....

but compressed data itself can also be multiple reference (reflink
likewise), so currently EROFS uses a seperate pseudo inode if it
decides with physical addresses as indexes.

> 
> On the write side, compression can be done directly into the high
> offset page cache range for that 64kb offset range, then we can
> map that to a remote xattr block and write the xattr. The xattr
> naturally handles variable size blocks.

Also different from plain text, each compression fses may keep
different encoded data forms (e.g. fses could add headers or
trailers to the on-disk compressed data or add more informations
to extent metadata) for their own needs.  So unlike the current
unique plain text process, an unique encoder/decoder may not sound
quite flexible though.

Thanks,
Gao Xiang