linux-kernel - Re: [PATCH] overlayfs: port all superblock creation logging to fsopen logs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOQ4uxj_jY36nJ9eTVv5VomSp+ne_yif-6JPZcQB1nXDdRC02w@mail.gmail.com>
Date: Tue, 26 Nov 2024 10:47:32 +0100
From: Amir Goldstein <amir73il@...il.com>
To: Aleksa Sarai <cyphar@...har.com>, Miklos Szeredi <miklos@...redi.hu>
Cc: Karel Zak <kzak@...hat.com>, Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, 
	linux-unionfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] overlayfs: port all superblock creation logging to fsopen logs

On Tue, Nov 26, 2024 at 8:25 AM Aleksa Sarai <cyphar@...har.com> wrote:
>
> On 2024-11-13, Karel Zak <kzak@...hat.com> wrote:
> > On Thu, Nov 07, 2024 at 02:09:19AM GMT, Aleksa Sarai wrote:
> > > On 2024-11-06, Amir Goldstein <amir73il@...il.com> wrote:
> > > > On Wed, Nov 6, 2024 at 12:00 PM Amir Goldstein <amir73il@...il.com> wrote:
> > > > >
> > > > > On Wed, Nov 6, 2024 at 10:59 AM Christian Brauner <brauner@...nel.org> wrote:
> > > > > >
> > > > > > On Wed, Nov 06, 2024 at 02:09:58PM +1100, Aleksa Sarai wrote:
> > > > > > > overlayfs helpfully provides a lot of of information when setting up a
> > > > > > > mount, but unfortunately when using the fsopen(2) API, a lot of this
> > > > > > > information is mixed in with the general kernel log.
> > > > > > >
> > > > > > > In addition, some of the logs can become a source of spam if programs
> > > > > > > are creating many internal overlayfs mounts (in runc we use an internal
> > > > > > > overlayfs mount to protect the runc binary against container breakout
> > > > > > > attacks like CVE-2019-5736, and xino_auto=on caused a lot of spam in
> > > > > > > dmesg because we didn't explicitly disable xino[1]).
> > > > > > >
> > > > > > > By logging to the fs_context, userspace can get more accurate
> > > > > > > information when using fsopen(2) and there is less dmesg spam for
> > > > > > > systems where a lot of programs are using fsopen("overlay"). Legacy
> > > > > > > mount(2) users will still see the same errors in dmesg as they did
> > > > > > > before (though the prefix of the log messages will now be "overlay"
> > > > > > > rather than "overlayfs").
> > > > >
> > > > > I am not sure about the level of risk in this format change.
> > > > > Miklos, WDYT?
> > > > >
> > > > > > >
> > > > > > > [1]: https://bbs.archlinux.org/viewtopic.php?pid=2206551
> > > > > > >
> > > > > > > Signed-off-by: Aleksa Sarai <cyphar@...har.com>
> > > > > > > ---
> > > > > >
> > > > > > To me this sounds inherently useful! So I'm all for it.
> > > > > >
> > > > >
> > > > > [CC: Karel]
> > > > >
> > > > > I am quite concerned about this.
> > > > > I have a memory that Christian suggested to make this change back in
> > > > > the original conversion to new mount API, but back then mount tool
> > > > > did not print out the errors to users properly and even if it does
> > > > > print out errors, some script could very well be ignoring them.
> > >
> > > I think Christian mentioned this at LSF/MM (or maybe LPC), but it seems
> > > that util-linux does provide the log information now in the case of
> > > fsconfig(2) errors:
> > >
> > >     % strace -e fsopen,fsconfig mount -t overlay -o userxattr=str x /tmp/a
> > >     fsopen("overlay", FSOPEN_CLOEXEC)       = 3
> > >     fsconfig(3, FSCONFIG_SET_STRING, "source", "foo", 0) = 0
> > >     fsconfig(3, FSCONFIG_SET_STRING, "userxattr", "str", 0) = -1 EINVAL (Invalid argument)
> > >     mount: /tmp/a: fsconfig system call failed: overlay: Unexpected value for 'userxattr'.
> > >                dmesg(1) may have more information after failed mount system call.
> > >
> > > (Using the current HEAD of util-linux -- openSUSE's util-linux isn't
> > > compiled with support for fsopen apparently.)
> >
> > After failed mount-related syscalls, libmount reads messages prefixed
> > with "e " from the file descriptor created by fdopen(). These messages
> > are later printed by mount(8).
> >
> > mount(8) or libmount does not read anything from kmesg.
> >
> > > However, it doesn't output any of the info-level ancillary
> > > information if there were no errors.
> >
> > This is the expected default behavior. mount(8) does not print any
> > additional information.
> >
> > We can enhance libmount to read and print other messages on stdout if
> > requested by the user. For example, the mount(8) command has a
> > --verbose option that is currently only used by some /sbin/mount.<type>
> > helpers, but not by mount(8) itself. We can improve this and use it in
> > libmount to read and print info-level messages.
> >
> > I can prepare a libmount/mount(8) patch for this.
>
> This sounds like a good idea to me.
>
> > > So there will definitely be some loss of
> > > information for pr_* logs that don't cause an actual error (which is a
> > > little unfortunate, since that is the exact dmesg spam that caused me to
> > > write this patch).
> > >
> > > I could take a look at sending a patch to get libmount to output that
> > > information, but that won't help with the immediate issue, and this
> > > doesn't help with the possible concern with some script that scrapes
> > > dmesg. (Though I think it goes without saying that such scripts are kind
> > > of broken by design -- since unprivileged users can create overlayfs
> > > mounts and thus spam the kernel log with any message, there is no
> > > practical way for a script to correctly get the right log information
> > > without using the new mount API's logging facilities.)
> >
> > > I can adjust this patch to only include the log+return-an-error cases,
> > > but that doesn't really address your primary concern, I guess.
> > >
> > > > > My strong feeling is that suppressing legacy errors to kmsg should be opt-in
> > > > > via the new mount API and that it should not be the default for libmount.
> > > > > IMO, it is certainly NOT enough that new mount API is used by userspace
> > > > > as an indication for the kernel to suppress errors to kmsg.
> >
> > For me, it seems like we are mixing two things together.
> >
> > kmesg is a *log*, and tools like systemd read and save it. It is used
> > for later issue debugging or by log analyzers. This means that all
> > relevant information should be included.
> >
> > The stderr/stdout output from tools such as mount(8) is simply
> > feedback for users or scripts, and informational messages are just
> > hints. They should not be considered a replacement for system logging
> > facilities. The same applies to messages read from the new mount API;
> > they should not be a replacement for system logs.
> >
> > In my opinion, it is acceptable to suppress optional and unimportant
> > messages and not save them into kmesg. However, all other relevant
> > messages should be included regardless of the tool or API being used.
>
> For warning or error messages, this makes sense -- though I think the
> "least spammy" option would be that the logs are output to kmesg if
> userspace closes the fscontext fd without reading the logs. That should
> catch programs that miss log information, without affecting programs
> that do read the logs (and do whatever they feel is appropriate with
> them). That would be some reasonable default behaviour, and users could
> explicitly opt into a verbose mode.
>
> For informational or debug messages, I feel that the default should be
> that we want to avoid outputting to kmesg when using the new mount API
> since the information is non-critical and the only way of associating
> the information is using the fscontext log. But if we had this "only log
> on close if not read" behaviour, I think having the same behaviour for
> all log messages would still work and would be more consistent.
>
> > Additionally, it should be noted that mount(8)/libmount is only a
> > userspace tool and is not necessary for mounting filesystems. The
> > kernel should not rely on libmount behavior; there are other tools
> > available such as busybox.
>
> Sure, but by switching to the new mount API you are buying into
> different behaviour for error logs (if only for the generic VFS ones),
> regardless of what kind of program you are.
>
> > > I can see an argument for some kind of MS_SILENT analogue for
> > > fsconfig(), though it will make the spam problem worse until programs
> > > migrate to setting this new flag.
> >
> > Yes, the ideal solution would be to have mount options that can
> > control this behavior. This would allow users to have control over it
> > and save their settings to fstab, as well as keep it specific to the
> > mount node.
> >
> > > Also, as this is already an issue ever since libmount added support for
> > > the new API (so since 2.39 I believe?), I think it would make just as
> > > much sense for this flag to be opt-in -- so libmount could set the
> > > "verbose" or "kmsglog" flag by default but most normal programs would
> > > not get the spammy behaviour by default.

The "spammy" behavior is the legacy behavior, so we do not want to
regress it without opt-in unless there is a good reason to do it.
I may be slow, but I did not catch what that good reason is.

> >
> > I prefer if the default behavior is defined by the kernel, rather than
> > by userspace tools like libmount. If we were to automatically add any
> > mount options through libmount, it would make it difficult to coexist
> > with settings in fstab, etc. It's always better to have transparency
> > and avoid any hidden factors in the process.
>
> Right, my suggestion was that verbose should be opt-in precisely because
> wanting to output to kmesg when using the new mount API is something
> that only really makes sense for libmount and similar tools and so
> should be opt-in rather than opt-out.

My point was that kernel does not know which libmount version is used
and there are clearly libmount versions out in the wild, that will remain
out in the wild which by default, do not output all the legacy messages
that are currently printed to kmsg.

So I'm sorry, but I don't buy the argument for making the kernel default
behavior silence those messages.

Maybe I have a misconception about how useful those messages are.
I would love Miklos to chime in if he has an opinion.

Thanks,
Amir.