netdev - Re: [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces along with network namespaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87jzrvzc5v.fsf@toke.dk>
Date: Tue, 10 Oct 2023 00:03:24 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: David Ahern <dsahern@...il.com>, Stephen Hemminger
 <stephen@...workplumber.org>, netdev@...r.kernel.org, Nicolas Dichtel
 <nicolas.dichtel@...nd.com>, Christian Brauner <brauner@...nel.org>, David
 Laight <David.Laight@...LAB.COM>
Subject: Re: [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces
 along with network namespaces

"Eric W. Biederman" <ebiederm@...ssion.com> writes:

> Toke Høiland-Jørgensen <toke@...hat.com> writes:
>
>> The 'ip netns' command is used for setting up network namespaces with persistent
>> named references, and is integrated into various other commands of iproute2 via
>> the -n switch.
>>
>> This is useful both for testing setups and for simple script-based namespacing
>> but has one drawback: the lack of persistent mounts inside the spawned
>> namespace. This is particularly apparent when working with BPF programs that use
>> pinning to bpffs: by default no bpffs is available inside a namespace, and
>> even if mounting one, that fs disappears as soon as the calling
>> command exits.
>
> It would be entirely reasonable to copy mounts like /sys/fs/bpf from the
> original mount namespace into the temporary mount namespace used by
> "ip netns".
>
> I would call it a bug that "ip netns" doesn't do that already.
>
> I suspect that "ip netns" does copy the mounts from the old sysfs onto
> the new sysfs is your entire problem.

How would it do that? Walk mtab and remount everything identically after
remounting /sys? Or is there a smarter way to go about this?

> Or is their a reason that bpffs should be per network namespace?

Well, I first ran into this issue because of a bug report to
xdp-tools/libxdp about things not working correctly in network
namespaces:

https://github.com/xdp-project/xdp-tools/issues/364

And libxdp does assume that there's a separate bpffs per network
namespace: it persists things into the bpffs that is tied to the network
devices in the current namespace. So if the bpffs is shared, an
application running inside the network namespace could access XDP
programs loaded in the root namespace. I don't know, but suspect, that
such assumptions would be relatively common in networking BPF programs
that use pinning (the pinning support in libbpf and iproute2 itself at
least have the same leaking problem if the bpffs is shared).

>> The underlying cause for this is that iproute2 will create a new mount namespace
>> every time it switches into a network namespace. This is needed to be able to
>> mount a /sys filesystem that shows the correct network device information, but
>> has the unfortunate side effect of making mounts entirely transient for any 'ip
>> netns' invocation.
>
> Mount propagation can be made to work if necessary, that would solve the
> transient problem.

Is mount propagation different from the remount thing you mentioned
above, or is this something different?

(Sorry for being hopelessly naive about this, as you probably guessed
from my previous email asking about this, I'm only now learning about
all the intricacies fs mounts).

>> This series is an attempt to fix this situation, by persisting a mount namespace
>> alongside the persistent network namespace (in a separate directory,
>> /run/netns-mnt). Doing this allows us to still have a consistent /sys inside
>> the namespace, but with persistence so any mounts survive.
>
> I really don't like that direction.
>
> "ip netns" was designed and really should continue to be a command that
> makes the world look like it has a single network namespace, for
> compatibility with old code.  Part of that old code "ip netns" supports
> is "ip" itself.

Well my idea with this change was to keep the functionality as close to
what 'ip' currently does, but just have mounts persist across
invocations.

> I think you are making bpffs unnecessarily per network namespace.

See above. 

>> This mode does come with some caveats. I'm sending this as RFC to get feedback
>> on whether this is the right thing to do, especially considering backwards
>> compatibility. On balance, I think that the approach taken here of
>> unconditionally persisting the mount namespace, and using that persistent
>> reference whenever it exists, is better than the current behaviour, and that
>> while it does represent a change in behaviour it is backwards compatible in a
>> way that won't cause issues. But please do comment on this; see the patch
>> description of patch 4 for details.
>
> As I understand it this will cause a problem for any application that
> is network namespace aware and does not use "ip netns" to wrap itself.
>
> I am fairly certain that pinning the mount namespace will result in
> never seeing an update of /etc/resolve.conf.  At least if you
> are on a system that has /etc/netns/NAME/resolve.conf

I was actually wondering about that /etc bind mounting support while I
was looking at this code. Could you please elaborate a bit on what that
is used for, exactly? :)

Also, if staleness of the /etc bind mounts is an issue, those could be
redone on every entry, couldn't they?

-Toke