linux-kernel - Re: [PATCH] overlayfs: port all superblock creation logging to fsopen logs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <j7ngxuxqdwrq5o6zi2hmt3zfmh6s5mzrlvwjw6snqbv5oc5ggo@nqpr6wjec7go>
Date: Wed, 13 Nov 2024 11:48:31 +0100
From: Karel Zak <kzak@...hat.com>
To: Aleksa Sarai <cyphar@...har.com>
Cc: Amir Goldstein <amir73il@...il.com>, 
	Christian Brauner <brauner@...nel.org>, Miklos Szeredi <miklos@...redi.hu>, Jan Kara <jack@...e.cz>, 
	linux-unionfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] overlayfs: port all superblock creation logging to
 fsopen logs

On Thu, Nov 07, 2024 at 02:09:19AM GMT, Aleksa Sarai wrote:
> On 2024-11-06, Amir Goldstein <amir73il@...il.com> wrote:
> > On Wed, Nov 6, 2024 at 12:00 PM Amir Goldstein <amir73il@...il.com> wrote:
> > >
> > > On Wed, Nov 6, 2024 at 10:59 AM Christian Brauner <brauner@...nel.org> wrote:
> > > >
> > > > On Wed, Nov 06, 2024 at 02:09:58PM +1100, Aleksa Sarai wrote:
> > > > > overlayfs helpfully provides a lot of of information when setting up a
> > > > > mount, but unfortunately when using the fsopen(2) API, a lot of this
> > > > > information is mixed in with the general kernel log.
> > > > >
> > > > > In addition, some of the logs can become a source of spam if programs
> > > > > are creating many internal overlayfs mounts (in runc we use an internal
> > > > > overlayfs mount to protect the runc binary against container breakout
> > > > > attacks like CVE-2019-5736, and xino_auto=on caused a lot of spam in
> > > > > dmesg because we didn't explicitly disable xino[1]).
> > > > >
> > > > > By logging to the fs_context, userspace can get more accurate
> > > > > information when using fsopen(2) and there is less dmesg spam for
> > > > > systems where a lot of programs are using fsopen("overlay"). Legacy
> > > > > mount(2) users will still see the same errors in dmesg as they did
> > > > > before (though the prefix of the log messages will now be "overlay"
> > > > > rather than "overlayfs").
> > >
> > > I am not sure about the level of risk in this format change.
> > > Miklos, WDYT?
> > >
> > > > >
> > > > > [1]: https://bbs.archlinux.org/viewtopic.php?pid=2206551
> > > > >
> > > > > Signed-off-by: Aleksa Sarai <cyphar@...har.com>
> > > > > ---
> > > >
> > > > To me this sounds inherently useful! So I'm all for it.
> > > >
> > >
> > > [CC: Karel]
> > >
> > > I am quite concerned about this.
> > > I have a memory that Christian suggested to make this change back in
> > > the original conversion to new mount API, but back then mount tool
> > > did not print out the errors to users properly and even if it does
> > > print out errors, some script could very well be ignoring them.
> 
> I think Christian mentioned this at LSF/MM (or maybe LPC), but it seems
> that util-linux does provide the log information now in the case of
> fsconfig(2) errors:
> 
> 	% strace -e fsopen,fsconfig mount -t overlay -o userxattr=str x /tmp/a
> 	fsopen("overlay", FSOPEN_CLOEXEC)       = 3
> 	fsconfig(3, FSCONFIG_SET_STRING, "source", "foo", 0) = 0
> 	fsconfig(3, FSCONFIG_SET_STRING, "userxattr", "str", 0) = -1 EINVAL (Invalid argument)
> 	mount: /tmp/a: fsconfig system call failed: overlay: Unexpected value for 'userxattr'.
> 		   dmesg(1) may have more information after failed mount system call.
> 
> (Using the current HEAD of util-linux -- openSUSE's util-linux isn't
> compiled with support for fsopen apparently.)

After failed mount-related syscalls, libmount reads messages prefixed
with "e " from the file descriptor created by fdopen(). These messages
are later printed by mount(8).

mount(8) or libmount does not read anything from kmesg.

> However, it doesn't output any of the info-level ancillary
> information if there were no errors.

This is the expected default behavior. mount(8) does not print any
additional information.

We can enhance libmount to read and print other messages on stdout if
requested by the user. For example, the mount(8) command has a
--verbose option that is currently only used by some /sbin/mount.<type>
helpers, but not by mount(8) itself. We can improve this and use it in
libmount to read and print info-level messages.

I can prepare a libmount/mount(8) patch for this.

> So there will definitely be some loss of
> information for pr_* logs that don't cause an actual error (which is a
> little unfortunate, since that is the exact dmesg spam that caused me to
> write this patch).
> 
> I could take a look at sending a patch to get libmount to output that
> information, but that won't help with the immediate issue, and this
> doesn't help with the possible concern with some script that scrapes
> dmesg. (Though I think it goes without saying that such scripts are kind
> of broken by design -- since unprivileged users can create overlayfs
> mounts and thus spam the kernel log with any message, there is no
> practical way for a script to correctly get the right log information
> without using the new mount API's logging facilities.)

> I can adjust this patch to only include the log+return-an-error cases,
> but that doesn't really address your primary concern, I guess.
> 
> > > My strong feeling is that suppressing legacy errors to kmsg should be opt-in
> > > via the new mount API and that it should not be the default for libmount.
> > > IMO, it is certainly NOT enough that new mount API is used by userspace
> > > as an indication for the kernel to suppress errors to kmsg.
 
For me, it seems like we are mixing two things together.

kmesg is a *log*, and tools like systemd read and save it. It is used
for later issue debugging or by log analyzers. This means that all
relevant information should be included.

The stderr/stdout output from tools such as mount(8) is simply
feedback for users or scripts, and informational messages are just
hints. They should not be considered a replacement for system logging
facilities. The same applies to messages read from the new mount API;
they should not be a replacement for system logs.

In my opinion, it is acceptable to suppress optional and unimportant
messages and not save them into kmesg. However, all other relevant
messages should be included regardless of the tool or API being used.

Additionally, it should be noted that mount(8)/libmount is only a
userspace tool and is not necessary for mounting filesystems. The
kernel should not rely on libmount behavior; there are other tools
available such as busybox.

> I can see an argument for some kind of MS_SILENT analogue for
> fsconfig(), though it will make the spam problem worse until programs
> migrate to setting this new flag.
 
Yes, the ideal solution would be to have mount options that can
control this behavior. This would allow users to have control over it
and save their settings to fstab, as well as keep it specific to the
mount node.

> Also, as this is already an issue ever since libmount added support for
> the new API (so since 2.39 I believe?), I think it would make just as
> much sense for this flag to be opt-in -- so libmount could set the
> "verbose" or "kmsglog" flag by default but most normal programs would
> not get the spammy behaviour by default.

I prefer if the default behavior is defined by the kernel, rather than
by userspace tools like libmount. If we were to automatically add any
mount options through libmount, it would make it difficult to coexist
with settings in fstab, etc. It's always better to have transparency
and avoid any hidden factors in the process.

    Karel

-- 
 Karel Zak  <kzak@...hat.com>
 http://karelzak.blogspot.com