[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6de1bf9e-f492-8dec-22c3-d1a1c6940006@spamtrap.tnetconsulting.net>
Date:   Mon, 25 Jun 2018 14:07:58 -0600
From:   Grant Taylor <gtaylor@...tconsulting.net>
To:     Julian Anastasov <ja@....bg>
Cc:     Akshat Kakkar <akshat.1984@...il.com>,
        netdev <netdev@...r.kernel.org>,
        cronolog+lartc <cronolog+lartc@...glemail.com>,
        lartc <lartc@...r.kernel.org>,
        Erik Auerswald <auerswal@...x-ag.uni-kl.de>
Subject: Re: Route fallback issue
On 06/25/2018 12:50 PM, Julian Anastasov wrote:
> Hello,
Hi Julian,
> Yes, ARP state for unreachable GWs may be updated slowly, there is 
> in-time feedback only for reachable state.
Fair.
Most of the installations where I needed D.G.D. to work would be okay 
with a < 5 minute timeout.  Obviously they would like faster, but 
automation is a LOT better than waiting on manual intervention.
IMHO < 30 seconds is great.  < 90 seconds is acceptable.  < 300 seconds 
leaves some room for improvement.
> You can create the two routes, of course. But only the default routes 
> are alternative.
Are you saying that the functionality I'm describing only works for 
default gateways or that the term "alternative route" only applies to 
default gateways?
The testing that I did indicated that alternative routes worked for 
specific prefixes too.
I tested multiple NetNSs with only directly attached routes and appended 
routes to a destination prefix, no default gateway / route of last resort.
The behavior seemed to be different when ignore_routes_with_linkdown was 
set verses unset.  Specifically, ignore_routes_with_linkdown seemed to 
help considerably.
Hence why I question the requirement for the "default" route verses a 
route to a specific prefix.
Can you explain why I saw the behavior difference with 
ignore_routes_with_linkdown if it only applies to the default route?
> The alternative routes work in this way:
> 
> - on lookup, routes are walked in order - as listed in table
> 
> - as long as route contains reachable gateway (ARP state), only this 
> route is used
> 
> - if some gateway becomes unreachable (ARP state), next alternative 
> routes are tried
> 
> - if ARP entry is expired (missing), this gateway can be probed if the 
> route is before the currently used route. This is what happens initially 
> when no ARP state is present for the GWs. It is bad luck if the probed 
> GW is actually unreachable.
> 
> - active probing by user space (ping GWs) can only help to keep the ARP 
> state present for the used gateways. By this way, if ARP entry for GW 
> is missing, the kernel will not risk to select unavailable route with 
> the goal to probe the GW.
This all makes sense.
Please confirm if "gateway" in this context is the "/default/ gateway" 
or not.  I ask because arguably "gateway" can be used as a term to 
describe the next hop for a route, or gateway, to a prefix.  Further, 
the "/default/ (gateway,router)" is the gateway or route of last resort. 
  Which to me means that "gateway" can be any route in this context.
> nexthop is the GW in the route
Thank you for confirming.
> Yes, the kernel avoids alternative routes with unreachable GWs
Fair enough.
> The multipath route uses all its alive nexthops at the same time... But 
> you may need in the same way active probing by user space, otherwise 
> unavailable GW can be selected.
I assume that the dead ECMP NEXTHOP is also subject to similar timeouts 
as alternative routes.  Correct?
> Yes, if you prefer, you may run PING every second to avoid such delays...
Agreed.
I'm trying to make sure I understand basic functionality before I do 
things to modify it.
-- 
Grant. . . .
unix || die
Download attachment "smime.p7s" of type "application/pkcs7-signature" (3982 bytes)
Powered by blists - more mailing lists
 
