lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20140709024821.6eb07566@samsung-9>
Date:	Wed, 9 Jul 2014 02:48:21 -0700
From:	Stephen Hemminger <stephen@...workplumber.org>
To:	netdev@...r.kernel.org
Subject: Fw: [Bug 79681] New: Out of order data packets triggered by ARP
 cache state update



Begin forwarded message:

Date: Tue, 8 Jul 2014 13:08:51 -0700
From: "bugzilla-daemon@...zilla.kernel.org" <bugzilla-daemon@...zilla.kernel.org>
To: "stephen@...workplumber.org" <stephen@...workplumber.org>
Subject: [Bug 79681] New: Out of order data packets triggered by ARP cache state update


https://bugzilla.kernel.org/show_bug.cgi?id=79681

            Bug ID: 79681
           Summary: Out of order data packets triggered by ARP cache state
                    update
           Product: Networking
           Version: 2.5
    Kernel Version: 3.15
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
          Assignee: shemminger@...ux-foundation.org
          Reporter: freeman.wang@...adcom.com
        Regression: No

Hi

We were testing high speed UDP throughput using iperf and found some out of
order packets which always happen immediately after an ARP exchange. Going
through the code and found that it is due to the wrong sequence of neighbour
state update and arq_queue flush.

At line 1161, the state is updated to the new state. And in line 1201, the
write lock is released. That is the reason why we saw this kind of sequence:

p1, p4, p2, p3, p5

If I did not misunderstand it, the arq_queue length is 3 by default. When the
ARP response is received, it will update the state and dequeue p1 first, then
the lock is dropped, so the thread trying to send p4 can squeeze in and send it
out since the state is 'connected'.

I think the solution should be moving the state update till the arp_queue is
fully drained.

Thanks
Freeman

1060 /* Generic update routine.
1061    -- lladdr is new lladdr or NULL, if it is not supplied.
1062    -- new    is new state.
1063    -- flags
1064         NEIGH_UPDATE_F_OVERRIDE allows to override existing lladdr,
1065                                 if it is different.
1066         NEIGH_UPDATE_F_WEAK_OVERRIDE will suspect existing "connected"
1067                                 lladdr instead of overriding it
1068                                 if it is different.
1069                                 It also allows to retain current state
1070                                 if lladdr is unchanged.
1071         NEIGH_UPDATE_F_ADMIN    means that the change is administrative.
1072
1073         NEIGH_UPDATE_F_OVERRIDE_ISROUTER allows to override existing
1074                                 NTF_ROUTER flag.
1075         NEIGH_UPDATE_F_ISROUTER indicates if the neighbour is known as
1076                                 a router.
1077
1078    Caller MUST hold reference count on the entry.
1079  */
1080
1081 int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
1082                  u32 flags)
1083 {
1084         u8 old;
1085         int err;
1086         int notify = 0;
1087         struct net_device *dev;
1088         int update_isrouter = 0;
1089
1090         write_lock_bh(&neigh->lock);
1091
1092         dev    = neigh->dev;
1093         old    = neigh->nud_state;
1094         err    = -EPERM;
1095
1096         if (!(flags & NEIGH_UPDATE_F_ADMIN) &&
1097             (old & (NUD_NOARP | NUD_PERMANENT)))
1098                 goto out;
1099
1100         if (!(new & NUD_VALID)) {
1101                 neigh_del_timer(neigh);
1102                 if (old & NUD_CONNECTED)
1103                         neigh_suspect(neigh);
1104                 neigh->nud_state = new;
1105                 err = 0;
1106                 notify = old & NUD_VALID;
1107                 if ((old & (NUD_INCOMPLETE | NUD_PROBE)) &&
1108                     (new & NUD_FAILED)) {
1109                         neigh_invalidate(neigh);
1110                         notify = 1;
1111                 }
1112                 goto out;
1113         }
1114
1115         /* Compare new lladdr with cached one */
1116         if (!dev->addr_len) {
1117                 /* First case: device needs no address. */
1118                 lladdr = neigh->ha;
1119         } else if (lladdr) {
1120                 /* The second case: if something is already cached
1121                    and a new address is proposed:
1122                    - compare new & old
1123                    - if they are different, check override flag
1124                  */
1125                 if ((old & NUD_VALID) &&
1126                     !memcmp(lladdr, neigh->ha, dev->addr_len))
1127                         lladdr = neigh->ha;
1128         } else {
1129                 /* No address is supplied; if we know something,
1130                    use it, otherwise discard the request.
1131                  */
1132                 err = -EINVAL;
1133                 if (!(old & NUD_VALID))
1134                         goto out;
1135                 lladdr = neigh->ha;
1136         }
1137
1138         if (new & NUD_CONNECTED)
1139                 neigh->confirmed = jiffies;
1140         neigh->updated = jiffies;
1141
1142         /* If entry was valid and address is not changed,
1143            do not change entry state, if new one is STALE.
1144          */
1145         err = 0;
1146         update_isrouter = flags & NEIGH_UPDATE_F_OVERRIDE_ISROUTER;
1147         if (old & NUD_VALID) {
1148                 if (lladdr != neigh->ha && !(flags &
NEIGH_UPDATE_F_OVERRIDE)) {
1149                         update_isrouter = 0;
1150                         if ((flags & NEIGH_UPDATE_F_WEAK_OVERRIDE) &&
1151                             (old & NUD_CONNECTED)) {
1152                                 lladdr = neigh->ha;
1153                                 new = NUD_STALE;
1154                         } else
1155                                 goto out;
1156                 } else {
1157                         if (lladdr == neigh->ha && new == NUD_STALE &&
1158                             ((flags & NEIGH_UPDATE_F_WEAK_OVERRIDE) ||
1159                              (old & NUD_CONNECTED))
1160                             )
1161                                 new = old;
1162                 }
1163         }
1164
1165         if (new != old) {
1166                 neigh_del_timer(neigh);
1167                 if (new & NUD_IN_TIMER)
1168                         neigh_add_timer(neigh, (jiffies +
1169                                                 ((new & NUD_REACHABLE) ?
1170
neigh->parms->reachable_time :
1171                                                  0)));
1172                 neigh->nud_state = new;
1173                 notify = 1;
1174         }
1175
1176         if (lladdr != neigh->ha) {
1177                 write_seqlock(&neigh->ha_lock);
1178                 memcpy(&neigh->ha, lladdr, dev->addr_len);
1179                 write_sequnlock(&neigh->ha_lock);
1180                 neigh_update_hhs(neigh);
1181                 if (!(new & NUD_CONNECTED))
1182                         neigh->confirmed = jiffies -
1183                                       (NEIGH_VAR(neigh->parms,
BASE_REACHABLE_TIME) << 1);
1184                 notify = 1;
1185         }
1186         if (new == old)
1187                 goto out;
1188         if (new & NUD_CONNECTED)
1189                 neigh_connect(neigh);
1190         else
1191                 neigh_suspect(neigh);
1192         if (!(old & NUD_VALID)) {
1193                 struct sk_buff *skb;
1194
1195                 /* Again: avoid dead loop if something went wrong */
1196
1197                 while (neigh->nud_state & NUD_VALID &&
1198                        (skb = __skb_dequeue(&neigh->arp_queue)) != NULL) {
1199                         struct dst_entry *dst = skb_dst(skb);
1200                         struct neighbour *n2, *n1 = neigh;
1201                         write_unlock_bh(&neigh->lock);
1202
1203                         rcu_read_lock();
1204
1205                         /* Why not just use 'neigh' as-is?  The problem is
that
1206                          * things such as shaper, eql, and sch_teql can
end up
1207                          * using alternative, different, neigh objects to
output
1208                          * the packet in the output path.  So what we need
to do
1209                          * here is re-lookup the top-level neigh in the
path so
1210                          * we can reinject the packet there.
1211                          */
1212                         n2 = NULL;
1213                         if (dst) {
1214                                 n2 = dst_neigh_lookup_skb(dst, skb);
1215                                 if (n2)
1216                                         n1 = n2;
1217                         }
1218                         n1->output(n1, skb);
1219                         if (n2)
1220                                 neigh_release(n2);
1221                         rcu_read_unlock();
1222
1223                         write_lock_bh(&neigh->lock);
1224                 }
1225                 __skb_queue_purge(&neigh->arp_queue);
1226                 neigh->arp_queue_len_bytes = 0;
1227         }
1228 out:
1229         if (update_isrouter) {
1230                 neigh->flags = (flags & NEIGH_UPDATE_F_ISROUTER) ?
1231                         (neigh->flags | NTF_ROUTER) :
1232                         (neigh->flags & ~NTF_ROUTER);
1233         }
1234         write_unlock_bh(&neigh->lock);
1235
1236         if (notify)
1237                 neigh_update_notify(neigh);
1238
1239         return err;
1240 }
1241 EXPORT_SYMBOL(neigh_update);

--
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ