netdev - Re: [PATCH net] netlink: fix potential deadlock in netlink_set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZJQT4/SZJN7qGUHI@nanopsycho>
Date: Thu, 22 Jun 2023 11:26:59 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	netdev@...r.kernel.org, eric.dumazet@...il.com,
	syzbot+a7d200a347f912723e5c@...kaller.appspotmail.com,
	Johannes Berg <johannes.berg@...el.com>
Subject: Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()

Thu, Jun 22, 2023 at 10:42:34AM CEST, edumazet@...gle.com wrote:
>On Thu, Jun 22, 2023 at 10:29 AM Jiri Pirko <jiri@...nulli.us> wrote:
>>
>> Thu, Jun 22, 2023 at 10:14:56AM CEST, edumazet@...gle.com wrote:
>> >On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@...nulli.us> wrote:
>> >>
>> >> Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@...gle.com wrote:
>> >> >syzbot reported a possible deadlock in netlink_set_err() [1]
>> >> >
>> >> >A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
>> >> >for netlink_lock_table()") in netlink_lock_table()
>> >> >
>> >> >This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
>> >> >which were not covered by cited commit.
>> >> >
>> >> >[1]
>> >> >
>> >> >WARNING: possible irq lock inversion dependency detected
>> >> >6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
>> >> >
>> >> >syz-executor.2/23011 just changed the state of lock:
>> >> >ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
>> >> >but this lock was taken by another, SOFTIRQ-safe lock in the past:
>> >> > (&local->queue_stop_reason_lock){..-.}-{2:2}
>> >> >
>> >> >and interrupts could create inverse lock ordering between them.
>> >> >
>> >> >other info that might help us debug this:
>> >> > Possible interrupt unsafe locking scenario:
>> >> >
>> >> >       CPU0                    CPU1
>> >> >       ----                    ----
>> >> >  lock(nl_table_lock);
>> >> >                               local_irq_disable();
>> >> >                               lock(&local->queue_stop_reason_lock);
>> >> >                               lock(nl_table_lock);
>> >> >  <Interrupt>
>> >> >    lock(&local->queue_stop_reason_lock);
>> >> >
>> >> > *** DEADLOCK ***
>> >> >
>> >> >Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
>> >>
>> >> I don't think that this "fixes" tag is correct. The referenced commit
>> >> is a fix to the same issue on a different codepath, not the one who
>> >> actually introduced the issue.
>> >>
>> >> The code itself looks fine to me.
>> >
>> >Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.
>>
>> I'm aware it didn't. But that does not implicate this patch should have
>> that commit as a "Fixes:" tag. Either have the correct one pointing out
>> which commit introduced the issue or omit the "Fixes:" tag entirely.
>> That's my point.
>
>My point is that the cited commit should have fixed all points
>where the nl_table_lock was read locked.

Yeah, it was incomplete. I agree. I don't argue with that.

>
>When we do locking changes, we have to look at the whole picture,
>not the precise point where lockdep complained.
>
>For instance, this is the reason this patch also changes  __netlink_diag_dump(),
>even if the report had nothing to do about it yet.
>
>So this Fixes: tag is fine, thank you.

Then we have to agree to disagree I guess. It is not fine.

Quoting from Documentation/process/handling-regressions.rst:
  Add a "Fixes:" tag to specify the commit causing the regression.

Quoting from Documentation/process/submitting-patches.rst:
  A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
  is used to make it easy to determine where a bug originated, which can help
  review a bug fix.

1d482e666b8e is not causing any regression (to be known of), definitelly
not the one this patch is fixing.

Misusing "Fixes" tag like this only adds unnecessary confusion.
That is my point from the beginning. I don't understand your resistance
to be honest.