netdev - Re: [PATCH net-next V1] net: flush sd->defer_list on unregister

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CANn89iJuEVe72bPmEftyEJHLzzN=QNR2yueFjTxYXCEpS5S8HQ@mail.gmail.com>
Date:   Thu, 20 Apr 2023 13:16:54 +0200
From:   Eric Dumazet <edumazet@...gle.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     netdev@...r.kernel.org, pabeni@...hat.com, kuba@...nel.org,
        hawk@...nel.org, davem@...emloft.net, lorenzo@...nel.org
Subject: Re: [PATCH net-next V1] net: flush sd->defer_list on unregister_netdevice

On Thu, Apr 20, 2023 at 12:06 PM Jesper Dangaard Brouer
<brouer@...hat.com> wrote:
>
> When removing a net_device (that use NAPI), the sd->defer_list

Why "(that use NAPI)" is relevant ?

> system can still hold on to SKBs that have a dst_entry which can
> have a netdev_hold reference.

This would be quite a bug really. What makes you think this ?

In order to validate this please post a stack trace if you get a
warning from this debug patch.

(make sure to build with CONFIG_DEBUG_NET=y)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 768f9d04911fb16a40e057aec7bfa2381a40d7a7..56c79bf922ab4045984513de13cc946b84fd3675
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6828,6 +6828,7 @@ nodefer:  __kfree_skb(skb);
                return;
        }

+       DEBUG_NET_WARN_ON_ONCE(skb_dst(skb));
        sd = &per_cpu(softnet_data, cpu);
        defer_max = READ_ONCE(sysctl_skb_defer_max);
        if (READ_ONCE(sd->defer_count) >= defer_max)



>
> Choose simple solution of flushing the softnet_data defer_list
> system as part of unregister_netdevice flush_all_backlogs().

I do not know if adding two conditional tests in the fast path is
worth the pain.

I was thinking of simplifying rules for defer_lock acquisition anyway [1]

If you plan a V2, I would advise to block BH before calling
skb_defer_free_flush.


[1]
diff --git a/net/core/dev.c b/net/core/dev.c
index 3fc4dba71f9dd250c59c0a070566791f0cd27ec4..be68b8f7fe16658f3304696bf3c83df4c8af51c7
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6631,11 +6631,11 @@ static void skb_defer_free_flush(struct
softnet_data *sd)
        if (!READ_ONCE(sd->defer_list))
                return;

-       spin_lock_irq(&sd->defer_lock);
+       spin_lock(&sd->defer_lock);
        skb = sd->defer_list;
        sd->defer_list = NULL;
        sd->defer_count = 0;
-       spin_unlock_irq(&sd->defer_lock);
+       spin_unlock(&sd->defer_lock);

        while (skb != NULL) {
                next = skb->next;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 768f9d04911fb16a40e057aec7bfa2381a40d7a7..27f6e0042dfe0cbbb362d06b540b3ca1567459af
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6817,7 +6817,6 @@ void skb_attempt_defer_free(struct sk_buff *skb)
 {
        int cpu = skb->alloc_cpu;
        struct softnet_data *sd;
-       unsigned long flags;
        unsigned int defer_max;
        bool kick;

@@ -6833,7 +6832,10 @@ nodefer: __kfree_skb(skb);
        if (READ_ONCE(sd->defer_count) >= defer_max)
                goto nodefer;

-       spin_lock_irqsave(&sd->defer_lock, flags);
+       /* Strictly speaking, we do not need to block BH to acquire
+        * this spinlock, but lockdep disagrees (unless we add classes)
+       */
+       spin_lock_bh(&sd->defer_lock);
        /* Send an IPI every time queue reaches half capacity. */
        kick = sd->defer_count == (defer_max >> 1);
        /* Paired with the READ_ONCE() few lines above */
@@ -6842,7 +6844,7 @@ nodefer:  __kfree_skb(skb);
        skb->next = sd->defer_list;
        /* Paired with READ_ONCE() in skb_defer_free_flush() */
        WRITE_ONCE(sd->defer_list, skb);
-       spin_unlock_irqrestore(&sd->defer_lock, flags);
+       spin_unlock_bh(&sd->defer_lock);

        /* Make sure to trigger NET_RX_SOFTIRQ on the remote CPU
         * if we are unlucky enough (this seems very unlikely).