[Firehol-support] better understanding link-balancer and PBR

Wed Dec 7 18:14:44 GMT 2016

Thanks for your thoughts Costa, this kind of readily available insight and
help along with great SW makes me really happy I chose firehol.

inline below (most of my considerations come from reading
http://linux-ip.net/html/routing-selection.html#routing-selection-adv):

On Wed, Dec 7, 2016 at 3:12 AM Tsaousis, Costa <costa at tsaousis.gr> wrote:

> It simplifies routing significantly. Without this inheritance, policy
> based routing would be a lot more complicated. Imagine it. You have your
> static routes and 2 upstream providers. How would you say that lan server1
> is to be routed via ISP2, without loosing your static routes?
>

I'm not sure I see the point still. If the rules were not copied over they
would still exist in main. Since PBR follows the rules in priority order
and continues to the next rule if a match is not found, with l-b's default
behavior, a static route found in main would win over the default route
pointing to nexthop and the other table, no?

>
>
> Q2) l-b generates a nexthop default route using the GWs I configured as
> default . When the packet encounters that do they go back to look at the
> rules and then match Table1 for GW1 or Table2 for GW2 depending on nexthop
> selected? If not, then what are those tables set up for? the main table
> would already know how to reach those destinations since they are local.
>
>
> This is done with policy based routing. Check: ip rule show or the policy
> section in link-balancer.conf
>

maybe I didn't ask this clearly, lemme try again. I'm wondering if when the
kernel chooses the default nexthop route in main that triggers another pass
of the rules or not. Does that make more sense?

>
>
> Q3) my understanding is that routes are cached, so even after a link has
> gone down a client will still make the same choice in terms of routing a
> certain ip. Is that correct? ie it won't look at the rule or tables and
> just pick the cached route. So for example if when 2 GWs were up, and
> packets were routed through GW1, with Table1 having GW1 as its default
> route, and then GW1 went down, subsequent packets would still route through
> GW1 until the cached route expired. Is that correct? If that's true, then
> what's the point of changing the default route in Table1 to use GW2 when
> the rule that pointed to GW1 is removed anyway?
>
>
> hm... I don't know how the routing cache works exactly. I know however,
> that in all cases I have encountered so far, my problem was only the
> iptables connection tracker, especially when NAT is involved or CONNMARK is
> used.
> I had to to run conntrack to delete all the rules of the failed gateway,
> to prevent long timeouts.
>

ah, interesting point. I found this on route caching which was a good read
even tho some of the info is deprecated in newer kernels:

https://vincent.bernat.im/en/blog/2011-ipv4-route-cache-linux.html

> This ping-pong case is common if the check depends on the presence or not
> of routes.
>

oh, good to know, but I honestly don't see why. If I have 2 GWs and one
fail, why would the detection ping-pong between FAILED and OK? it seems it
should stay failed, no?

I found this in the docs for L-B:

*Link Balancer will automatically either use a fallback gateway or copy the
default-gateway of the origin table to the new table, so that traffic will
continue to be served by the routing table that all its gateways went down.
Of course, when the interface is restored, Link Balancer will restore the
proper default gateway for this interface.*

this seems to be the problem with the ping-pong to me: if GW1 failed and
Table1's default GW(GW1) is replaced by GW2, then obviously the next run of
L-B would succeed, no? being a default route even if the ping selects the
source address of the dead GW it'd still go through. Am I misunderstanding
something?

thanks,

Spike