[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <BD422CF9-E99A-4D31-815D-70C6DABBC6CB@dilger.ca>
Date: Thu, 18 Aug 2011 15:53:11 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: djwong@...ibm.com
Cc: Theodore Ts'o <tytso@....edu>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
linux-ext4 List <linux-ext4@...r.kernel.org>,
Sunil Mushran <sunil.mushran@...cle.com>,
Joel Becker <jlbec@...lplan.org>,
Mingming Cao <cmm@...ibm.com>,
Amir Goldstein <amir73il@...il.com>,
Coly Li <colyli@...il.com>, Andi Kleen <andi@...stfloor.org>
Subject: Re: [RFC] ext4 metadata checksumming design
On 2011-08-18, at 12:14 PM, Darrick J. Wong wrote:
> On Thu, Aug 18, 2011 at 12:16:00AM -0600, Andreas Dilger wrote:
>> On 2011-08-16, at 9:25 PM, "Darrick J. Wong" <djwong@...ibm.com> wrote:
>>> - Extended attribute blocks that are stored in the inode table -- the h_magic
>>> field is written by the kernel, but neither the kernel nor e2fsprogs ever
>>> actually read this field. The field could be reused to checksum the extra
>>> space since (as far as I can tell) EAs are the only user of that empty space.
>>
>> I haven't had a chance to read the document you wrote, but wanted to comment
>> on xattrs. There is a hash field for each xattr (including internal xattrs),
>> and one for the external xattr blocks that can be used to validate the xattr
>> value.
>>
>> In addition to the hash for the in-inode xattrs, the inode hash itself would
>> serve to validate the xattr values.
>>
>> I have a patch for e2fsprogs that checks the xattr hash for in-inode xattrs
>> (currently it is always 0).
>
> I surveyed the h_hash/e_hash calculation code; it only covers the name and
> value fields. Do we care about checksum protection for the extra fields in
> struct ext4_xattr_header and struct ext4_xattr_entry?
For entries in the inode, the ext4_xattr_entry should itself already be
covered by the inode hash.
For entries in an external EA block, the ext4_xattr_header h_hash only
contains the hash of individual e_hash values, themselves covering the
e_name and e_value parts of the structure.
This is by design since the h_hash is used to determine whether an xattr
block can be shared among inodes (mbcache, which I'm not a fan of, but
that is besides the point) so it should depend only on the content and
not the actual layout (e.g. cannot contain e_value_offs, e_value_block,
h_refcount, h_blocks).
I'm totally against using the h_magic field for a checksum, since it would
break our ability to change the structure of xattrs in the future. Instead,
it would be possible to use a new h_magic (e.g. EA03) that redefines the hash
function to be what we want it to be (e.g. e_hash is CRC32c of everything
except e_value_offs and e_hash, and h_hash is CRC32c of all e_hash fields),
without making the code significantly more complex.
That would firstly give us a better hash function for the entries, and
secondly add in the e_name_len, e_name_index, and e_value_size values into
e_hash (and subsequently into h_hash).
We could also/instead use h_reserved[0] for a hash of ext4_xattr_header
and the missing e_* fields to have a stronger validation of the entire block,
without breaking the ability to use h_hash as a hash of the content, and
without doubling the CPU cost of doing the checksum.
Alternately, if we want to spend the extra CPU cycles, it would be possible
to just use h_reserved[0] for a CRC32c (or other) checksum of the whole
block.
Using only h_reserved[0] instead of changing h_magic would not require
changing the h_magic at all (which would break read-only compatibility).
> I think it would be
> useful to be able to check the sanity of h_refcount and h_blocks. Possibly
> that extends to e_value_* as well, though the hash probably covers it. Also,
> there's no hardware acceleration available for the xattr hash, though I doubt
> xattrs are especially performance sensitive.
>
> --D
>>
>>> Please have a look at the design document and please feel free to suggest any
>>> changes.
>>
>> Hopefully soon.
>>
>> Cheers, Andreas--
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
Cheers, Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists