netdev - Re: Urgent Bug Report Kernel crash 6.5.2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date: Thu, 21 Sep 2023 11:13:55 +0300
From: Martin Zaharinov <micron10@...il.com>
To: Bagas Sanjaya <bagasdotme@...il.com>
Cc: Eric Dumazet <edumazet@...gle.com>,
 Paolo Abeni <pabeni@...hat.com>,
 netdev <netdev@...r.kernel.org>,
 patchwork-bot+netdevbpf@...nel.org,
 Jakub Kicinski <kuba@...nel.org>,
 Stephen Hemminger <stephen@...workplumber.org>,
 kuba+netdrv@...nel.org,
 dsahern@...il.com,
 Florian Westphal <fw@...len.de>,
 Pablo Neira Ayuso <pablo@...filter.org>
Subject: Re: Urgent Bug Report Kernel crash 6.5.2

Hi Bagas,


Its not easy to make this on production, have too many users on it.

i make checks and find with kernel 6.3.12-6.5.13 all is fine.
on first machine that i have with kernel 6.4 and still work run kernel 6.4.2 and have problem.

in my investigation problem is start after migration to kernel 6.4.x 

in 6.4 kernel is add rcuref : 

https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.4 

commit bc9d3a9f2afca189a6ae40225b6985e3c775375e
Author: Thomas Gleixner <tglx@...utronix.de>
Date: Thu Mar 23 21:55:32 2023 +0100

net: dst: Switch to rcuref_t reference counting

Under high contention dst_entry::__refcnt becomes a significant bottleneck.

atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes into
high retry rates on contention.

Switch the reference count to rcuref_t which results in a significant
performance gain. Rename the reference count member to __rcuref to reflect
the change.

The gain depends on the micro-architecture and the number of concurrent
operations and has been measured in the range of +25% to +130% with a
localhost memtier/memcached benchmark which amplifies the problem
massively.

Running the memtier/memcached benchmark over a real (1Gb) network
connection the conversion on top of the false sharing fix for struct
dst_entry::__refcnt results in a total gain in the 2%-5% range over the
upstream baseline.

Reported-by: Wangyang Guo <wangyang.guo@...el.com>
Reported-by: Arjan Van De Ven <arjan.van.de.ven@...el.com>
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Link: https://lore.kernel.org/r/20230307125538.989175656@linutronix.de
Link: https://lore.kernel.org/r/20230323102800.215027837@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@...nel.org>


and i think problem is here : 

--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -66,7 +66,7 @@ void dst_init(struct dst_entry *dst, str
dst->tclassid = 0;
#endif
dst->lwtstate = NULL;
- atomic_set(&dst->__refcnt, initial_ref);
+ rcuref_init(&dst->__refcnt, initial_ref);
dst->__use = 0;
dst->lastuse = jiffies;
dst->flags = flags;
@@ -162,31 +162,15 @@ EXPORT_SYMBOL(dst_dev_put);

void dst_release(struct dst_entry *dst)
{
- if (dst) {
- int newrefcnt;
-
- newrefcnt = atomic_dec_return(&dst->__refcnt);
- if (WARN_ONCE(newrefcnt < 0, "dst_release underflow"))
- net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
- __func__, dst, newrefcnt);
- if (!newrefcnt)
- call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
- }
+ if (dst && rcuref_put(&dst->__refcnt))
+ call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
}
EXPORT_SYMBOL(dst_release);

void dst_release_immediate(struct dst_entry *dst)
{
- if (dst) {
- int newrefcnt;
-
- newrefcnt = atomic_dec_return(&dst->__refcnt);
- if (WARN_ONCE(newrefcnt < 0, "dst_release_immediate underflow"))
- net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
- __func__, dst, newrefcnt);
- if (!newrefcnt)
- dst_destroy(dst);
- }
+ if (dst && rcuref_put(&dst->__refcnt))
+ dst_destroy(dst);
}
EXPORT_SYMBOL(dst_release_immediate);


but this is my thinking


Martin


> On 21 Sep 2023, at 10:50, Bagas Sanjaya <bagasdotme@...il.com> wrote:
> 
> On 20/09/2023 14:32, Martin Zaharinov wrote:
>> I will make this yes .
>> 
>> And will wait if any find fix in future release.
>> 
> 
> Please don't top-post; reply inline with appropriate context instead.
> 
> Martin, what prevents you from doing bisection as Eric requested again?
> If you only have production systems, why can't you afford to have
> testing ones? Why not turning one of your prod machines to be testing
> and bisect from there?
> 
> Sorry for inconvenience.
> 
> -- 
> An old man doll... just what I always wanted! - Clara
>