[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <MW2PR2101MB0892FC0F67BD25661CDCE149BF529@MW2PR2101MB0892.namprd21.prod.outlook.com>
Date: Wed, 12 May 2021 22:17:22 +0000
From: Dexuan Cui <decui@...rosoft.com>
To: "netfilter-devel@...r.kernel.org" <netfilter-devel@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: Stephen Hemminger <sthemmin@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: netfilter: iptables-restore: setsockopt(3, SOL_IP,
IPT_SO_SET_REPLACE, "security...", ...) return -EAGAIN
Hi,
I'm debugging an iptables-restore failure, which happens about 5% of the
time when I keep stopping and starting the Linux VM. The VM has only 1
CPU, and kernel version is 4.15.0-1098-azure, but I suspect the issue may
also exist in the mainline Linux kernel.
When the failure happens, it's always caused by line 27 of the rule file:
1 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
2 *raw
3 :PREROUTING ACCEPT [0:0]
4 :OUTPUT ACCEPT [0:0]
5 -A PREROUTING ! -s 168.63.129.16/32 -p tcp -j NOTRACK
6 -A OUTPUT ! -d 168.63.129.16/32 -p tcp -j NOTRACK
7 COMMIT
8 # Completed on Fri Apr 23 09:22:59 2021
9 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
10 *filter
11 :INPUT ACCEPT [2407:79190058]
12 :FORWARD ACCEPT [0:0]
13 :OUTPUT ACCEPT [1648:2190051]
14 -A OUTPUT -d 169.254.169.254/32 -m owner --uid-owner 33 -j DROP
15 COMMIT
16 # Completed on Fri Apr 23 09:22:59 2021
17 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
18 *security
19 :INPUT ACCEPT [2345:79155398]
20 :FORWARD ACCEPT [0:0]
21 :OUTPUT ACCEPT [1504:2129015]
22 -A OUTPUT -d 168.63.129.16/32 -p tcp -m owner --uid-owner 0 -j ACCEPT
23 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP
24 -A OUTPUT -d 168.63.129.16/32 -p tcp -m owner --uid-owner 0 -j ACCEPT
25 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP
26 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP
27 COMMIT
The related part of the strace log is:
1 socket(PF_INET, SOCK_RAW, IPPROTO_RAW) = 3
2 getsockopt(3, SOL_IP, IPT_SO_GET_INFO, "security\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., [84]) = 0
3 getsockopt(3, SOL_IP, IPT_SO_GET_ENTRIES, "security\0\357B\16Z\177\0\0Pg\355\0\0\0\0\0Pg\355\0\0\0\0\0"..., [880]) = 0
4 setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE, "security\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2200) = -1 EAGAIN (Resource temporarily unavailable)
5 close(3) = 0
6 write(2, "iptables-restore: line 27 failed"..., 33) = 33
The -EAGAIN error comes from line 1240 of xt_replace_table():
do_ipt_set_ctl
do_replace
__do_replace
xt_replace_table
1216 xt_replace_table(struct xt_table *table,
1217 unsigned int num_counters,
1218 struct xt_table_info *newinfo,
1219 int *error)
1220 {
1221 struct xt_table_info *private;
1222 unsigned int cpu;
1223 int ret;
1224
1225 ret = xt_jumpstack_alloc(newinfo);
1226 if (ret < 0) {
1227 *error = ret;
1228 return NULL;
1229 }
1230
1231 /* Do the substitution. */
1232 local_bh_disable();
1233 private = table->private;
1234
1235 /* Check inside lock: is the old number correct? */
1236 if (num_counters != private->number) {
1237 pr_debug("num_counters != table->private->number (%u/%u)\n",
1238 num_counters, private->number);
1239 local_bh_enable();
1240 *error = -EAGAIN;
1241 return NULL;
1242 }
When the function returns -EAGAIN, the 'num_counters' is 5 while
'private->number' is 6.
If I re-run the iptables-restore program upon the failure, the program
will succeed.
I checked the function xt_replace_table() in the recent mainline kernel and it
looks like the function is the same.
It looks like there is a race condition between iptables-restore calls
getsockopt() to get the number of table entries and iptables call
setsockopt() to replace the entries? Looks like some other program is
concurrently calling getsockopt()/setsockopt() -- but it looks like this is
not the case according to the messages I print via trace_printk() around
do_replace() in do_ipt_set_ctl(): when the -EAGAIN error happens, there is
no other program calling do_replace(); the table entry number was changed
to 5 by another program 'iptables' about 1.3 milliseconds ago, and then
this program 'iptables-restore' calls setsockopt() and the kernel sees
'num_counters' being 5 and the 'private->number' being 6 (how can this
happen??); the next setsockopt() call for the same 'security' table
happens in about 1 minute with both the numbers being 6.
Can you please shed some light on the issue? Thanks!
BTW, iptables does have a retry mechanism for getsockopt():
2f93205b375e ("Retry ruleset dump when kernel returns EAGAIN.")
(https://git.netfilter.org/iptables/commit/libiptc?id=2f93205b375e&context=10&ignorews=0&dt=0)
But it looks like this is enough? e.g. here getsockopt() returns 0, but
setsockopt() returns -EAGAIN.
Thanks,
Dexuan
Powered by blists - more mailing lists