linux-ext4 - Re: Backup/restore of fscrypt files and directories

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 10 Feb 2023 14:44:22 +0100
From:   Sebastien Buisson <sbuisson.work@...il.com>
To:     Eric Biggers <ebiggers@...nel.org>
Cc:     linux-fscrypt@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-ext4@...r.kernel.org
Subject: Re: Backup/restore of fscrypt files and directories

Hi Eric,

Thanks for your feedback.

Le 08/02/2023 à 20:28, Eric Biggers a écrit :
> Hi Sebastien,
>
> On Wed, Feb 08, 2023 at 01:09:50PM +0100, Sebastien Buisson wrote:
>> I am planning to implement backup and restore for fscrypt files and
>> directories and propose the following design, and would welcome feedback on
>> this approach.
> Thanks for looking into this.  Before getting too far into the details of your
> proposal, are you aware of the previous threads about this?  Specifically:
>
> "backup/restore of fscrypt files"
> (https://lore.kernel.org/linux-fscrypt/D1AD7D55-94D6-4C19-96B4-BAD0FD33CF49@dilger.ca/T/#u)
>
> And the discussion that happened as part of
> "[PATCH RERESEND v9 0/9] fs: interface for directly reading/writing compressed data"
> (https://lore.kernel.org/linux-fsdevel/CAHk-=wh74eFxL0f_HSLUEsD1OQfFNH9ccYVgCXNoV1098VCV6Q@mail.gmail.com
> and its responses).

I knew about the first one, but had not stumbled accross the discussion 
that happened in the compression related thread, thanks.

> Both times before, it was brought up that the hardest part is backing up and
> restoring the filenames, including symlinks.  I don't think your proposal really
> addresses that.  Your proposal has a single filename in the security.encdata
> xattr.  But actually, a file can have many names.  Also, a file can have an
> encrypted name without being encrypted itself; that's the case for device node,
> socket, and FIFO files.  Also, symlinks have their target encrypted.

That is correct. The value of the enc_name field is the ciphertext name 
of the current dentry. Like with regular files, my impression was that 
tar (or the backup utility) would handle the hard links properly. 
According to you, what would make a difference between regular files and 
encrypted files regarding restore or hard links?

As for symlinks, you are right I need to dig further. I think at least 
the security.encdata xattr would need an additional field to hold the 
ciphertext symlink target.

> I think that your proposal, in general, needs more detail about how *restores*
> will work, since that's going to be much harder than backups.  It's not hard to
> get the filesystem to give you more information; it's much harder to make
> changes to a filesystem while keeping everything self-consistent!
>
> A description of the use cases of this feature would also be helpful.
> Historically, people have said they needed this feature when they really didn't.

There is really a need for backup/restore at the file system level. For 
instance, in case of storage failure, we would want to restore files to 
a newly formatted device, in a finner granularity that cannot be 
achieved with a backup/restore at the device level, or because that 
would allow changing formatting options. Also, it particularly makes 
sense to have per-directory backups, as the block devices are getting 
larger and larger.

The ability to backup and restore encrypted files is interesting in 
annother use case: moving files between file systems and systems without 
the need to decrypt then re-encrypt.

>> The third challenge is to get access to the encryption context of files and
>> directories. By design, fscrypt does not expose this information, internally
>> stored as an extended attribute but with no associated handler.
> Actually, FS_IOC_GET_ENCRYPTION_POLICY_EX and FS_IOC_GET_ENCRYPTION_NONCE
> together give you all the information stored in the encryption context.
>
>> In order to address this need for backup/restore of encrypted files, we
>> propose to make use of a special extended attribute named security.encdata,
>> containing:
>> - encoding method used for binary data. Assume name can be up to 255 chars.
>> - clear text file data length in bytes (set to 0 for dirs).
> st_size already gives the plaintext file length, even while the encryption key
> is not present.

Exactly, and that would prevent normal utilities from reading raw 
encrypted content up to the end of the encryption block (if access 
without the key was granted).

>> - encryption context. 40 bytes for v2 encryption context.
>> - encrypted name. 256 bytes max.
>>
>> To improve portability if we need to change the on-disk format in the
>> future, and to make the archived data useful over a longer timeframe, the
>> content of the security.encdata xattr is expressed as ASCII text with a
>> "key: value" YAML format. As encryption context and encrypted file name are
>> binary, they need to be encoded.
>> So the content of the security.encdata xattr would be something like:
>>
>>    { encoding: base64url, size: 3012, enc_ctx: YWJjZGVmZ2hpamtsbW
>>    5vcHFyc3R1dnd4eXphYmNkZWZnaGlqa2xtbg, enc_name: ZmlsZXdpdGh2ZX
>>    J5bG9uZ25hbWVmaWxld2l0aHZlcnlsb25nbmFtZWZpbGV3aXRodmVyeWxvbmdu
>>    YW1lZmlsZXdpdGg }
>>
>> Because base64 encoding has a 33% overhead, this gives us a maximum xattr
>> size of approximately 800 characters.
>> This extended attribute would not be shown when listing xattrs, only exposed
>> when fetched explicitly, and unmodified tools would not be able to access
>> the encrypted files in any case. It would not be stored on disk, only
>> computed when fetched.
> An xattr containing multiple key-value pairs is quite strange, since xattrs
> themselves are key-value pairs.  This could just be multiple xattrs.
>
> Did you choose this design because you intend for this to be treated as an
> opaque blob that userspace must not interpret at all?

This format is chosen to be readable and potentially modified if 
implementation of backup/restore of encrypted files evolves in the 
future. As you mention, some of the information returned in the 
security.encdata xattr can be retrieved by other means. But the idea to 
have a single xattr that holds all the information is to ease 
implementation in the backup/restore tools. For them, the backup 
operation would just consist in fetching the security.encdata xattr if 
dealing with an encrypted file. So from that standpoint, the content of 
the xattr is not supposed to be interpreted by the backup/restore tools. 
However, having a readable multi key-value pair format increases 
portability and makes it possible for other tools to convert to a newer 
format if the need arises in the future.

>> File and file system backups often use the tar utility either directly or
>> under the covers. We propose to modify the tar utility to make it
>> "encryption aware", but the same relatively small changes could be done with
>> other common backup utilities like cpio as needed. When detecting ext4
>> encrypted files, tar would need to explicitly fetch the security.encdata
>> extended attribute, and store it along with the backup file. Fetching this
>> extended attribute would internally trigger in ext4 a mechanism responsible
>> for gathering the required information. Because we must not make any clear
>> text copy of encrypted files, the encryption key must not be present.
> Why can't the encryption key be present during backup?  Surely some people are
> going to want to back up encrypted files consistently in ciphertext form,
> regardless of whether the key happens to be present or not at the particular
> time the backup is being done?  Consider e.g. a bunch of user home directories
> which are regularly being locked and unlocked, and the system administrator is
> taking backups of everything.
That is a very good question. Of course we do not want to make clear 
text copies of encrypted files, but you are right that we should also 
support making a ciphertext backup while the key is present. I guess 
this is achievable thanks to a specific flag to open() or preadv2() as 
mentioned below.
>> Tar
>> would also need to use a special flag that would allow reading raw data
>> without the encryption key. Such a flag could be named O_FILE_ENC, and would
>> need to be coupled with O_DIRECT so that the page cache does not see this
>> raw data. O_FILE_ENC could take the value of (O_NOCTTY | O_NDELAY) as they
>> are unlikely to be used in practice and are not harmful if used incorrectly.
> Maybe call this O_CIPHERTEXT?  Also note that a new RWF_* flag to preadv2,
> instead of a new O_* flag to open(), has been suggested before.
>
>> The name of the backed-up file would be the encoded+digested form returned
>> by fscrypt.
> Does this have a meaning, since the actual name would be stored separately?
But the backed-up file needs to have a name right? Given that the 
encoded+digested form returned by fscrypt is unique for the directory, I 
thought it would be fine to use. Can you think of another name to give 
to backed-up files?
>> The tar utility would be used to extract a previously created tarball
>> containing encrypted files. When restoring the security.encdata extended
>> attribute, instead of storing the xattr as-is on disk, this would internally
>> trigger in ext4 a mechanism responsible for extracting the required
>> information, and storing them accordingly. Tar would also need to specify
>> the O_FILE_ENC | O_DIRECT flags to write raw data without the encryption
>> key.
>>
>> To create a valid encrypted file with proper encryption context and
>> encrypted name, we can implement a mechanism where the file is first created
>> with O_TMPFILE in the encrypted directory to avoid triggering the encryption
>> context check before setting the security.encdata xattr, and then atomically
>> linking it to the namespace with the correct encrypted name.
> How exactly does the link to the correct name happen?  What if there's more than
> one name?  What about restoring non-regular files?

So the restore tool first creates the file with O_TMPFILE in the 
encrypted directory, and writes its ciphertext content (with a special 
flag mentioned above). Then the tool sets the security.encdata xattr. 
Internally fscrypt uses the value of the enc_ctx field to set the .c 
xattr on the file, and the size field to set the plaintext file length. 
The value of the enc_name field is stored temporarily by fscrypt in a 
dedicated xattr such as "ciphertextname". Then the tool calls linkat() 
on the file. Internally, seeing the special flag and the presence of the 
"ciphertextname" xattr, fscrypt uses this value as the new name.

The purpose of this is to impose the provided encryption context and 
encrypted name, instead of having new ones generated at file creation.

In the case of hard links, I do not know how tar for instance handles 
this for normal files. Do you have any ideas?


Cheers,

Sebastien.


>> The security.encdata extended attribute contains the encryption context of
>> the file or directory. This has a 16-byte nonce (per-file random value) that
>> is used along with the master key to derive the per-file key thanks to a KDF
>> function. But the master key is not stored in ext4, so it is not backed up
>> as part of the scenario described above, which makes the backup of the raw
>> encrypted files safe.
> Side note: the backup/restore support will need to be disabled on files that use
> FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64 or FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32, since
> those files are tied to the filesystem they are on.
>
> - Eric