lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <b1d8722e-5660-c38e-848f-3220d642889d@huawei.com>
Date:   Sun, 29 Jan 2023 11:08:38 +0800
From:   Zhang Changzhong <zhangchangzhong@...wei.com>
To:     Network Development <netdev@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>
CC:     Julian Anastasov <ja@....bg>,
        "David S. Miller" <davem@...emloft.net>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        "Denis V. Lunev" <den@...nvz.org>,
        Nikolay Aleksandrov <razor@...ckwall.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Zhang Changzhong <zhangchangzhong@...wei.com>,
        YueHaibing <yuehaibing@...wei.com>
Subject: [Question] neighbor entry doesn't switch to the STALE state after the
 reachable timer expires

Hi,

We got the following weird neighbor cache entry on a machine that's been running for over a year:
172.16.1.18 dev bond0 lladdr 0a:0e:0f:01:12:01 ref 1 used 350521/15994171/350520 probes 4 REACHABLE

350520 seconds have elapsed since this entry was last updated, but it is still in the REACHABLE
state (base_reachable_time_ms is 30000), preventing lladdr from being updated through probe.

After some analysis, we found a scenario that may cause such a neighbor entry:

          Entry used          	  DELAY_PROBE_TIME expired
NUD_STALE ------------> NUD_DELAY ------------------------> NUD_PROBE
                            |
                            | DELAY_PROBE_TIME not expired
                            v
                      NUD_REACHABLE

The neigh_timer_handler() use time_before_eq() to compare 'now' with 'neigh->confirmed +
NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME)', but time_before_eq() only works if delta < ULONG_MAX/2.

This means that if an entry stays in the NUD_STALE state for more than ULONG_MAX/2 ticks, it enters
the NUD_RACHABLE state directly when it is used again and cannot be switched to the NUD_STALE state
(the timer is set too long).

On 64-bit machines, ULONG_MAX/2 ticks are a extremely long time, but in my case (32-bit machine and
kernel compiled with CONFIG_HZ=250), ULONG_MAX/2 ticks are about 99.42 days, which is possible in
reality.

Does anyone have a good idea to solve this problem? Or are there other scenarios that might cause
such a neighbor entry?

-----
Best Regards,
Changzhong Zhang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ