netdev - Re: [PATCH net] netlink: fix potential deadlock in netlink_set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZJQwFWwT1ahFypTq@nanopsycho>
Date: Thu, 22 Jun 2023 13:27:17 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	netdev@...r.kernel.org, eric.dumazet@...il.com,
	syzbot+a7d200a347f912723e5c@...kaller.appspotmail.com,
	Johannes Berg <johannes.berg@...el.com>
Subject: Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()

Thu, Jun 22, 2023 at 12:10:19PM CEST, edumazet@...gle.com wrote:
>On Thu, Jun 22, 2023 at 11:27 AM Jiri Pirko <jiri@...nulli.us> wrote:
>>
>> Thu, Jun 22, 2023 at 10:42:34AM CEST, edumazet@...gle.com wrote:
>> >On Thu, Jun 22, 2023 at 10:29 AM Jiri Pirko <jiri@...nulli.us> wrote:
>> >>
>> >> Thu, Jun 22, 2023 at 10:14:56AM CEST, edumazet@...gle.com wrote:
>> >> >On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@...nulli.us> wrote:
>> >> >>
>> >> >> Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@...gle.com wrote:
>> >> >> >syzbot reported a possible deadlock in netlink_set_err() [1]
>> >> >> >
>> >> >> >A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
>> >> >> >for netlink_lock_table()") in netlink_lock_table()
>> >> >> >
>> >> >> >This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
>> >> >> >which were not covered by cited commit.
>> >> >> >
>> >> >> >[1]
>> >> >> >
>> >> >> >WARNING: possible irq lock inversion dependency detected
>> >> >> >6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
>> >> >> >
>> >> >> >syz-executor.2/23011 just changed the state of lock:
>> >> >> >ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
>> >> >> >but this lock was taken by another, SOFTIRQ-safe lock in the past:
>> >> >> > (&local->queue_stop_reason_lock){..-.}-{2:2}
>> >> >> >
>> >> >> >and interrupts could create inverse lock ordering between them.
>> >> >> >
>> >> >> >other info that might help us debug this:
>> >> >> > Possible interrupt unsafe locking scenario:
>> >> >> >
>> >> >> >       CPU0                    CPU1
>> >> >> >       ----                    ----
>> >> >> >  lock(nl_table_lock);
>> >> >> >                               local_irq_disable();
>> >> >> >                               lock(&local->queue_stop_reason_lock);
>> >> >> >                               lock(nl_table_lock);
>> >> >> >  <Interrupt>
>> >> >> >    lock(&local->queue_stop_reason_lock);
>> >> >> >
>> >> >> > *** DEADLOCK ***
>> >> >> >
>> >> >> >Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
>> >> >>
>> >> >> I don't think that this "fixes" tag is correct. The referenced commit
>> >> >> is a fix to the same issue on a different codepath, not the one who
>> >> >> actually introduced the issue.
>> >> >>
>> >> >> The code itself looks fine to me.
>> >> >
>> >> >Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.
>> >>
>> >> I'm aware it didn't. But that does not implicate this patch should have
>> >> that commit as a "Fixes:" tag. Either have the correct one pointing out
>> >> which commit introduced the issue or omit the "Fixes:" tag entirely.
>> >> That's my point.
>> >
>> >My point is that the cited commit should have fixed all points
>> >where the nl_table_lock was read locked.
>>
>> Yeah, it was incomplete. I agree. I don't argue with that.
>>
>> >
>> >When we do locking changes, we have to look at the whole picture,
>> >not the precise point where lockdep complained.
>> >
>> >For instance, this is the reason this patch also changes  __netlink_diag_dump(),
>> >even if the report had nothing to do about it yet.
>> >
>> >So this Fixes: tag is fine, thank you.
>>
>> Then we have to agree to disagree I guess. It is not fine.
>>
>> Quoting from Documentation/process/handling-regressions.rst:
>>   Add a "Fixes:" tag to specify the commit causing the regression.
>>
>> Quoting from Documentation/process/submitting-patches.rst:
>>   A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
>>   is used to make it easy to determine where a bug originated, which can help
>>   review a bug fix.
>
>The previous commit definitely had an issue, because it was not complete.

Is it the originator of the bug? No.


>
>We have many other cases of commits that complete the work started earlier.

The fact something was done wrongly in the past should not serve
as an argument to continue.


>
>I will continue doing so, and I will continue writing changelogs that
>displease you.

Awesome. Basically you say you don't care about the rules that others
are required to stick to. Making your statement personal is a bit
offending I have to say.


>
>>
>> 1d482e666b8e is not causing any regression (to be known of), definitelly
>> not the one this patch is fixing.
>
>>
>> Misusing "Fixes" tag like this only adds unnecessary confusion.
>> That is my point from the beginning. I don't understand your resistance
>> to be honest.
>
>To be honest, I do not understand you either, I suggest we move on,
>we obviously are not on the same page.

Yeah, well rules are here for very good reasons and should apply
to anyone. It's a matter of principles. Who else than one of the
maintainers should obey them and require people to obey them?

Just my 2 cents. I'm done with this thread.