netdev - Re: Route fallback issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <544875a1-68e6-a8f3-4eb7-44f053605c3e@spamtrap.tnetconsulting.net>
Date:   Thu, 21 Jun 2018 15:08:03 -0600
From:   Grant Taylor <gtaylor@...tconsulting.net>
To:     Julian Anastasov <ja@....bg>
Cc:     Akshat Kakkar <akshat.1984@...il.com>,
        netdev <netdev@...r.kernel.org>,
        cronolog+lartc <cronolog+lartc@...glemail.com>,
        lartc <lartc@...r.kernel.org>,
        Erik Auerswald <auerswal@...x-ag.uni-kl.de>
Subject: Re: Route fallback issue

On 06/21/2018 01:57 PM, Julian Anastasov wrote:
> Hello,

Hi.

> I think so

Okay.

I'll do some more digging.

> You can search on net. I have some old docs on these issues, they should 
> be actual:
> 
> http://ja.ssi.bg/dgd-usage.txt

"DGD" or "Dead Gateway Detection" sounds very familiar.  I referenced it 
in an earlier reply.

I distinctly remember DGD not behaving satisfactorily years ago.  Where 
unsatisfactorily was something like 90 seconds (or more) to recover. 
Which actually matches what I was getting without the 
ignore_routes_with_linkdown=1 setting that David A. mentioned.

With ignore_routes_with_linkdown=1 things behaved much better.

> Not true. net/ipv4/fib_semantics.c:fib_select_path() calls 
> fib_select_default() only when prefixlen = 0 (default route).

Okay....  My testing last night disagrees with you.  Specifically, I was 
able to add a alternate routes to the same prefix, 192.0.2.128/26. 
There was not any default gateway configured on any of the NetNSs.  So 
everything was using routes for locally attacked or the two added via 
"ip route append".

What am I misinterpreting?  Or where are we otherwise talking past each 
other?

> Otherwise, only the first route will be considered.

"only the first route" almost sounds like something akin to Equal Cost 
Multi Path.

I was not expecting "alternative routes" to use more than one route at a 
time, equally or otherwise.  I was wanting for the kernel to fall back 
to an alternate route / gateway / path in the event that the one that 
was being used became unusable / unreachable.

So what should "Alternative Routes" do?  How does this compare / 
contract to E.C.M.P. or D.G.D.

> fib_select_default() is the function that decides which nexthop 
> is reachable and whether to contact it. It uses the ARP state via 
> fib_detect_death(). That is all code that is behind this feature called 
> "alternative routes": the kernel selects one based on nexthop's ARP 
> state.

Please confirm that you aren't entering / referring to E.C.M.P. 
territory when you say "nexthop".  I think that you are not, but I want 
to ask and be sure, particularly seeing as how things are very closely 
related.

It sounds like you're referring to literally the router that is the next 
hop in the path.  I.e. the device on the other end of the wire.

I'll have to find, read, and try to grok the code to have a better idea. 
  That being said, it looks like (based on the name) that 
fib_select_default() deals with the default route.  The testing I did 
last night, and positive results, indicate that the kernel did what I 
wanted it to do.  (See above about D.G.D. vs E.C.M.P.)

So, it seems as if something about alternative routes worked using 
non-default routes.  I have no way of knowing if it was the code that 
we're talking about, or something else that produced the results.  Given 
the way I did the test (specific prefixes, non-default, routes being 
appended with no other routes) worked the way that I would have thought 
that a feature that uses alternative routes (or historically D.G.D.) 
would have worked.

The following ping works just fine as I bounce interfaces on NS1.

ns2# ping -I 192.0.2.254 192.0.2.129

I can confirm that traffic is moving back and forth between the vEth 
links between the NetNSs.  Granted, the traffic sticks to one vEth 
interface until it goes away.

I can shut down ns2a on NS1 so that ns1a sees loss of link but but stays 
up on NS2, and traffic moves to vEth-B.

I can then open up ns2a on NS1 so that ns1a sees link on NS2, and 
re-append the route on NS1.

I can then shut down ns2b on NS1 so that ns1b sees loss of link but 
stays up on NS2, and traffic moves to vEth-A.

I can then open up ns2b on NS1 so that ns1b sees link on NS2, and 
re-append the route on NS1.

NS2 behaves exactly as I would hope.  Traffic will move from the down 
interface to the remaining up interface.  Back and forth, no problem.

I don't know where the disconnect is, but I feel like there is one.

> Routes with different metric are considered only when the routes with 
> lower metric are removed.

I agree with the statement.  What I question is where metric came into 
play here.  All of the routes had the same (default) metric.  None of 
the routes I tested had different metrics.

ns1# ip route show
192.0.2.0/26 dev ns2a proto kernel scope link src 192.0.2.1
192.0.2.64/26 dev ns2b proto kernel scope link src 192.0.2.65
192.0.2.128/26 dev dummy0 proto kernel scope link src 192.0.2.129
192.0.2.192/26 via 192.0.2.62 dev ns2a
192.0.2.192/26 via 192.0.2.126 dev ns2b

ns2# ip route show
192.0.2.0/26 dev ns1a proto kernel scope link src 192.0.2.62
192.0.2.64/26 dev ns1b proto kernel scope link src 192.0.2.126
192.0.2.128/26 via 192.0.2.65 dev ns1b
192.0.2.128/26 via 192.0.2.1 dev ns1a
192.0.2.192/26 dev dummy0 proto kernel scope link src 192.0.2.254

> IIRC, this flag invalidates nexthops depending on the link state. If 
> your link is always UP it does not help much.

That's what I gathered.  So things like DSL & cable modems or other L2 
bridging devices might not drop the link when their circuit drops.

This is also why I asked the follow up questions to David's email.

I want to do some testing to see if fib_multipath_use_neigh alters this 
behavior at all.  I'm hoping that it will invalidate an alternate route 
if the MAC is not resolvable even if the physical link stays up.

Sure, the ARP cache may have a 30 ~ 120 second timeout before triggering 
this behavior.  But having that timeout and starting to use an 
alternative route is considerably better than not using an alternative 
route.

> If you rely on user space tool, you can check the state of the desired 
> hops: device link state, your gateway to ISP, one or more gateways in the 
> ISP network which you consider permanent part of the path via this ISP.

This is what I have thought about doing previously.

> First route can be created with 'add' but all next alternative routes 
> can be added only with "append". If you successfully add them with 
> "add" it means they are not alternatives to the first one, they are not 
> considered at all.

ACK



-- 
Grant. . . .
unix || die


Download attachment "smime.p7s" of type "application/pkcs7-signature" (3982 bytes)