[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200402230233.mumqo22khf7q7o7c@topsnens>
Date: Fri, 3 Apr 2020 01:02:33 +0200
From: Maximilian Bosch <maximilian@...sch.me>
To: netdev@...r.kernel.org
Subject: Re: VRF Issue Since kernel 5
Hi!
> I do not see how this worked on 4.19. My comment above is a fundamental
> property of VRF and has been needed since day 1. That's why 'ip vrf
> exec' exists.
I'm afraid I have to disagree here: first of all, I created a
regression-test in NixOS for this purpose a while ago[1]. The third test-case
(lines 197-208) does basically what I demonstrated in my previous emails
(opening SSH connetions through a local VRF). This worked fine until we
bumped our default kernel to 5.4.x which is the reason why this testcase
is temporarily commented out.
While this is helpful to demonstrate the issue, I acknowledge that this is
pretty useless for a non-NixOS user which is why I did some further research
today:
After skimming through the VRF-related changes in 4.20 and 5.0 (which
might've had some relevant changes as you suggested previously), I
rebuilt the kernels 5.4.29 and 5.5.13 with
3c82a21f4320c8d54cf6456b27c8d49e5ffb722e[2] reverted on top and the
commented-out testcase works fine again. In other words, my usecase
seems to have worked before and the mentioned commit appears to cause
the "regression".
To provide you with further context, I decided to run
`sudo perf record -e fib:* -a -g -- ssh root@...60.36.231 -o ConnectTimeout=10s true`
again on my patched kernel at 5.5.13.
The result is available under
https://gist.githubusercontent.com/Ma27/a6f83e05f6ffede21c2e27d5c7d27098/raw/40c78603d5f76caa8717e293aba5c609bf7f013d/perf-report.txt
Thanks!
Maximilian
[1] https://github.com/NixOS/nixpkgs/blob/58c7a952a13a65398bed3f539061e69f523ee377/nixos/tests/systemd-networkd-vrf.nix
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c82a21f4320c8d54cf6456b27c8d49e5ffb722e
On Wed, Apr 01, 2020 at 02:41:56PM -0600, David Ahern wrote:
> On 4/1/20 2:35 PM, Maximilian Bosch wrote:
> > Hi!
> >
> >> This should work:
> >> make -C tools/testing/selftests/net nettest
> >> PATH=$PWD/tools/testing/selftests/net:$PATH
> >> tools/testing/selftests/net/fcnal-test.sh
> >
> > Thanks, will try this out later.
> >
> >> If you want that ssh connection to work over a VRF you either need to
> >> set the shell context:
> >> ip vrf exec <NAME> su - $USER
> >>
> >
> > Yes, using `ip vrf exec` is basically my current workaround.
>
> that's not a workaround, it's a requirement. With VRF configured all
> addresses are relative to the L3 domain. When trying to connect to a
> remote host, the VRF needs to be given.
>
> >
> >> or add 'ip vrf exec' before the ssh. If it is an incoming connection to
> >> a server the ssh server either needs to be bound to the VRF or you need
> >> 'net.ipv4.tcp_l3mdev_accept = 1'
> >
> > Does this mean that the `*l3mdev_accept`-parameters only "fix" this
> > issue if the VRF is on the server I connect to?
>
> server side setting only.
>
> >
> > In my case the VRF is on my local machine and I try to connect through
> > the VRF to the server.
> >
> >> The tcp reset suggests you are doing an outbound connection but the
> >> lookup for what must be the SYN-ACK is not finding the local socket -
> >> and that is because of the missing 'ip vrf exec' above.
> >
> > I only experience this behavior on a 5.x kernel, not on e.g. 4.19
> > though. I may be wrong, but isn't this a breaking change for userspace
> > applications in the end?
>
> I do not see how this worked on 4.19. My comment above is a fundamental
> property of VRF and has been needed since day 1. That's why 'ip vrf
> exec' exists.
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists