[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1515691504.1318.9.camel@cohaesio.com>
Date: Thu, 11 Jan 2018 17:25:16 +0000
From: "Anders K. Pedersen | Cohaesio" <akp@...aesio.com>
To: "weiwan@...gle.com" <weiwan@...gle.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"netfilter-devel@...r.kernel.org" <netfilter-devel@...r.kernel.org>
Subject: Re: [bisected] Forwarded packets occasionally has loopback output
interface in Netfilter
On tir, 2017-12-26 at 12:05 +0100, Anders K. Pedersen | Cohaesio wrote:
> Hello,
>
> On one of our border routers, Netfilter is occasionally logging packets
> with "OUT=lo" (output interface lo) even though the packet should be
> going out via a regular interface. This behavior is present on Linux
> 4.13.0 to 4.14.9, and a bisection of the problem points to
>
> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly
>
> as the first bad commit. This commit adds dst_dev_put() calls before
> some dst_release() calls, and dst_dev_put() does
>
> dst->dev = dev_net(dst->dev)->loopback_dev;
>
> (among other things), which fits the problem we're seeing.
>
> The essential part of our nftables rule set that shows this behavior is
>
> chain forward {
> type filter hook forward priority 0;
>
> meta oif { $internal_interfaces } accept
>
> meta oif lo ip daddr != 127.0.0.0/8 \
> log group 0 snaplen 80 prefix "oif-lo" counter
>
> ip saddr { $our_ip_series } \
> flow table acct_out \
> { meta oif . rt nexthop . ip saddr timeout 12m counter } \
> accept
>
> log group 0 snaplen 80 prefix "DROP" counter drop
> }
>
> The router only does stateless packet filtering and no redirection or
> rewriting of the packets (connection tracking, NAT, ipvs etc. are not
> even compiled for this kernel).
>
> As a result of this problem we see packets that should be going to an
> internal interface (and thus accepted by the first rule above) being
> logged and dropped by the last rule. Some examples:
>
> Dec 22 11:57:02 cix4 oif-lo IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118 DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF PROTO=TCP SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705 WINDOW=3295 ACK URGP=0 MARK=0
> Dec 22 11:57:02 cix4 DROP IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118 DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF PROTO=TCP SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705 WINDOW=3295 ACK URGP=0 MARK=0
>
> Dec 22 12:47:07 cix4 oif-lo IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18 DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF PROTO=TCP SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103 WINDOW=65535 ACK PSH URGP=0 MARK=0
> Dec 22 12:47:07 cix4 DROP IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18 DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF PROTO=TCP SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103 WINDOW=65535 ACK PSH URGP=0 MARK=0
>
> Dec 22 12:53:56 cix4 oif-lo IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34 DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF PROTO=TCP SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612 WINDOW=65535 ACK PSH URGP=0 MARK=0
> Dec 22 12:53:56 cix4 DROP IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34 DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF PROTO=TCP SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612 WINDOW=65535 ACK PSH URGP=0 MARK=0
>
> It also happens for outbound traffic, where the packets are logged and
> counted in the acct_out flow table with "meta oif" = "lo", but a
> correct "rt nexthop" - an example:
>
> Dec 22 12:29:13 cix4 oif-lo IN=team0.20 OUT=lo MAC=3c:fd:fe:15:db:a8:00:24:a8:ff:f0:00:08:00 SRC=212.97.129.25 DST=95.166.119.129 LEN=40 TOS=00 PREC=0x00 TTL=62 ID=19481 DF PROTO=TCP SPT=443 DPT=52560 SEQ=3034827396 ACK=2862814901 WINDOW=12618 ACK URGP=0 MARK=0
>
> # nft list flow table filter acct_out|tr ',' '\n'|grep lo
> flow table acct_out {
> "lo" . 94.101.208.217 . 212.97.129.25 expires 3m17s : counter packets 1 bytes 40
>
> I don't know if these packets are actually sent out on the correct
> outbound interface thanks to the proper nexthop (the MAC= information
> in the Netfilter log is from the received packet and thus not useful
> here).
>
> I tried running a tcpdump on the lo interface to see if these packets
> would show up there, but during the three days I had it running, it
> only logged one such packet, while Netfilter logs 20+ outbound packets
> every day, and the one packet logged by tcpdump was *not* logged by
> Netfilter.
Further testing of the individual parts of the first bad commit shows
that the five first additions of the dst_dev_put() call doesn't trigger
the problem, while the last one does (also without the first five), so
the problematic part is:
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 3dee004..d986d80 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1369,6 +1372,7 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
prev = cmpxchg(p, orig, rt);
if (prev == orig) {
if (orig) {
+ dst_dev_put(&orig->dst);
dst_release(&orig->dst);
rt_free(orig);
}
Any help with fixing this would be much appreciated. If there's
anything I should try to debug this further, please let me know.
Regards,
Anders K. Pedersen
Powered by blists - more mailing lists