lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d95e4afc-3a95-a3a7-edd9-1bd13cdec90@ssi.bg>
Date:   Tue, 31 Jan 2023 18:13:50 +0200 (EET)
From:   Julian Anastasov <ja@....bg>
To:     Zhang Changzhong <zhangchangzhong@...wei.com>
cc:     Network Development <netdev@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        "Denis V. Lunev" <den@...nvz.org>,
        Nikolay Aleksandrov <razor@...ckwall.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        YueHaibing <yuehaibing@...wei.com>
Subject: Re: [Question] neighbor entry doesn't switch to the STALE state
 after the reachable timer expires


	Hello,

On Tue, 31 Jan 2023, Zhang Changzhong wrote:

> Thanks for your fix!
> 
> I reproduced this problem by directly modifying the confirmed time of
> the neigh entry in STALE state. Your change can fix the problem in this
> scenario.

	I'm glad that you have a way to test it :) Thanks!

> Just curious, why did you choose 'jiffies - MAX_JIFFY_OFFSET + 86400 * HZ'
> as the value of 'mint'?

	It is too arbitrary :) Probably, just 'jiffies - MAX_JIFFY_OFFSET'
is enough or something depending on HZ/USER_HZ. I added 1 day for
timer to advance without leaving confirmed time behind the
jiffies - MAX_JIFFY_OFFSET zone but it is not needed.

	What limits play here:

- the HZ/USER_HZ difference: jiffies_to_clock_t reports the 3 times
to user space, so we want to display values as large as possible.
Any HZ > 100 for USER_HZ=100 works for the jiffies - MAX_JIFFY_OFFSET.
HZ=100 does not work.

- users can use large values for sysctl vars which can keep the timer
running for long time and reach some outdated confirmed time
before neigh_add_timer() is called to correct it

	If we choose mint = jiffies - MAX_JIFFY_OFFSET,
for 32-bit we will have:

Past                     Future
++++++++++++++++++++++++++++++++++++++++++++++++++++
|  49  days   |  49 days  |         99 days        |
++++++++++++++^+++++++++++^+++++++++++++++++++++++++
              ^           ^
DELAY+PROBE   |           |
            mint         now

- used/confirmed times should be up to 49 days behind jiffies but
we have 49 days to stay in timer without correcting them,
so they can go up to 99 days in the past before going in
the future and trigger the problem

- as we avoid the checks in neigh_timer_handler to save CPU cycles,
one needs crazy sysctl settings to keep the timer in DELAY+PROBE
states for 49 days. With default settings, it is no more than
half minute. In this case even
mint = jiffies - LONG_MAX + 86400 * HZ should work.

- REACHABLE state extends while confirmed time advances,
otherwise PROBE will need ARP reply to recheck the
times in neigh_add_timer while entering REACHABLE again

Regards

--
Julian Anastasov <ja@....bg>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ