lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cfe53cb984dfd42f8b835d5e05b0393ba0352a7c.camel@sipsolutions.net>
Date:   Wed, 21 Jun 2023 22:27:36 +0200
From:   Johannes Berg <johannes@...solutions.net>
To:     Jakub Kicinski <kuba@...nel.org>,
        syzbot <syzbot+a7d200a347f912723e5c@...kaller.appspotmail.com>
Cc:     bpf@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        pabeni@...hat.com, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] possible deadlock in netlink_set_err

On Wed, 2023-06-21 at 12:42 -0700, Jakub Kicinski wrote:
> Hi Johannes,
> 
> Doesn't seem like netlink_set_err() wants to be called from just any
> context. Should we convert nl_table_lock to alwasy be _bh ?

So as I was writing this Eric responded too :) It does seem _bh wouldn't
be sufficient then, the lockdep report below also mentions IRQ
disabling.


I'm not entirely sure this is needed, and I'm also not sure we really
need to fix it immediately, it's a very old bug and one that's going to
be very difficult to actually hit a deadlock on in practice. The "CPU1"
part of the report is basically almost never happening.

This is why syzbot couldn't reproduce it, this code will always execute:

        spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
        for (i = 0; i < IEEE80211_MAX_QUEUES; i++) {
                skb_queue_walk_safe(&local->pending[i], skb, tmp) {
                        struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
                        if (info->control.vif == &sdata->vif) {
                                __skb_unlink(skb, &local->pending[i]);
                                ieee80211_free_txskb(&local->hw, skb);
                        }
                }
        }
        spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);


However, pretty much *all* of the time there will be no SKBs on this
pending list that have a report to send out to userspace in
ieee80211_free_txskb -> ieee80211_report_used_skb:

...
        } else if (info->ack_frame_id) {
                ieee80211_report_ack_skb(local, skb, acked, dropped,
                                         ack_hwtstamp);


and really, ack_frame_id is rarely set, and in addition it's pretty
unlikely that a frame would still be on the queue when the interface is
set down. That's also not a very common operation (it's not like you try
to do that many times per second), so ...


Another approach might be to just disentangle that loop I pasted above
out from the queue_stop_reason_lock, we can move the SKBs to a separate
list before freeing them.


That said, it does stand to reason that if nlmsg_multicast() or
netlink_broadcast() can be called with a gfp_t argument then it should
be possible to call it with IRQs disabled or under a lock such as this
example, so perhaps Eric's patch really is the right thing to do here,
to avoid this potential pitfall in the future.

johannes

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ