linux-ext4 - Re: Backup/restore of fscrypt files and directories

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y+asUDeRFGpig+wG@mit.edu>
Date:   Fri, 10 Feb 2023 15:42:56 -0500
From:   "Theodore Ts'o" <tytso@....edu>
To:     Sebastien Buisson <sbuisson.work@...il.com>
Cc:     Eric Biggers <ebiggers@...nel.org>, linux-fscrypt@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: Backup/restore of fscrypt files and directories

On Fri, Feb 10, 2023 at 02:44:22PM +0100, Sebastien Buisson wrote:
> As for symlinks, you are right I need to dig further. I think at least the
> security.encdata xattr would need an additional field to hold the ciphertext
> symlink target.

So I'd caution you against the concept of using the security.encdata
xattr.  In propose, it's being used in two different ways.  The first
way is as a system call / ioctl like way, and that's something which
is very much frowned upon, at least by many in the Kernel community.
The red flag here is when you say that the xattr isn't actually stored
on disk, but rather is created on the fly when the xattr is fetched.
If you need to fetch information from the kernel that's not stored as
part of the on-disk format, then use an ioctl or a system call.  Don't
try to turn the xattr interface into a system call / ioctl extension
like thing.

The other way you're using the encdata is that you're presuming that
this is how you'd store the information in the tar format.  And how we
fetch information from the kernel, and how it is stored as an exchange
format, should be decoupled as much as possible.

In the case of a tar archive, the symlink target is normally stored in
the data block of the tar archive.  In the case where the symlink is
encrypted, why should that change?  We aren't storing the encrypted
data in a different location, such as the encdata xattr; why should
that be different in the case of the symlink target?

Now, how you *fetch* the encrypted symlink target might be different,
such as how we fetch the contents of an unencrypted data file (via the
read system call) and how we fetch an unencrypted symlink target (via
the readlink system call) are different.

> > A description of the use cases of this feature would also be helpful.
> > Historically, people have said they needed this feature when they really didn't.
> 
> There is really a need for backup/restore at the file system level. For
> instance, in case of storage failure, we would want to restore files to a
> newly formatted device, in a finner granularity that cannot be achieved with
> a backup/restore at the device level, or because that would allow changing
> formatting options. Also, it particularly makes sense to have per-directory
> backups, as the block devices are getting larger and larger.
> 
> The ability to backup and restore encrypted files is interesting in annother
> use case: moving files between file systems and systems without the need to
> decrypt then re-encrypt.

The use case of being able to restore files without needing to decrypt
and re-encrypt is quite different from the use case where you want to
be able to backup the files without needing encryption keys present,
but the encryption keys *are* needed at restore time is quite
different --- and the latter is quite a bit easier.

For example, some of encryption modes which use the inode number as
part of the IV, could be handled if keys are needed at restore time;
but it would be quite a bit harder, if not impossible, if you want to
be able restore the ecrypted files without doing a decrypt/re-encrypt
pass.

Can you give more details about why you are interested in implementing
this?  Does your company have a more specific business justification
for wanting to invest in this work?  If so, can you say more about it?

The reason why I ask is because very often fscrypt gets used in
integrated solutions, where the encryption/decryption engine is done
in-line between the general purpose CPU and the storage device.  In
some cases, the users' encryption keys might be stored in a something
like ARM TrustZone or in some other specialized trusted key manager
where even the kernel running in the general purpose hardware won't
have access to *any* of the keys.  It's for that reason that we have
some of these alternate modes where the inode number is used as part
of the IV, as opposed to the more traditional scheme where the user's
key is used to derive a file-specific subkey.

One of the original use cases for fscrypt was for Android and ChromeOS
devices.  And for those devices the state tends to be synchronized
across multiple devices, including web browsers.  So the state ends up
getting saved, unencrypted, in an application specific format, so you
can recover very quickly with no data loss, even if the device gets
lost or destroyed[1]. 

[1] https://www.youtube.com/watch?v=lm-Vnx58UYo

It was for this reason that ultimately, we decided that there really
wasn't a need to back up the data in an encrypted form, since for the
use case that our company was interested in addressing, well over 90%
of the state was of necesity already being backed up in an unencrypted
format.  So it was easier to just backup remaining bits of state, and
if we need decrypt, then re-encrypt in a key which is derived from the
user's login password before it is sent up to the cloud server.

You may be trying to solve the problem in the most general way
possible, but sometimes that's not the best solution, especially once
time to market and cost/complexity of implementation is taken into
account.  As Linus Torvalds stated earlier today, when talking about
splice(2) vs sendfile(2):

   "... this is also very much an example of how "generic" may be
   something that is revered in computer science, but is often a
   *horrible* thing in reality....

   Special cases are often much simpler and easier, and sometimes the
   special cases are all you actually want." [2]

[2] https://lore.kernel.org/all/CAHk-=wip9xx367bfCV8xaF9Oaw4DZ6edF9Ojv10XoxJ-iUBwhA@mail.gmail.com/

> In the case of hard links, I do not know how tar for instance handles this
> for normal files. Do you have any ideas?

   "Tar stores hardlinks in the tarball by storing the first file (of
   a group of hardlinked files); the subsequent hard links to it are
   indicated by a special record. When untarring, encountering this
   record causes tar to create a hard link in the destination
   filesystem." [3]

[3] https://forums.whirlpool.net.au/archive/2787890

Why are you assuming that tar is the best format to use for storing
encrypted files?  It's going to require special extensions to the tar
format, which means it won't necessarily be interoperable across
different tar implementations.  (For example, the hard link support is
specific to GNU tar.)

Does your requirements (and this is why a more detailed explanation of
your use case would be helpful) require supporting hard links?  If it
doesn't and you don't mind storing N copies of the file in the tar
archive file, and not restoring the hard links when the tar file is
unpacked, then life is much simpler.  Which is why it's important to
be very clear about use cases and requirements before trying to design
a solution.

Cheers,

					- Ted