netdev - Re: [PATCH net] netlink: fix potential deadlock in netlink_set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLio4+ik5qAQwd7SKr8ihS+y7fvkYYc=ZuuGqJ4BVfgdQ@mail.gmail.com>
Date: Thu, 22 Jun 2023 12:10:19 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Jiri Pirko <jiri@...nulli.us>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, eric.dumazet@...il.com, 
	syzbot+a7d200a347f912723e5c@...kaller.appspotmail.com, 
	Johannes Berg <johannes.berg@...el.com>
Subject: Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()

On Thu, Jun 22, 2023 at 11:27 AM Jiri Pirko <jiri@...nulli.us> wrote:
>
> Thu, Jun 22, 2023 at 10:42:34AM CEST, edumazet@...gle.com wrote:
> >On Thu, Jun 22, 2023 at 10:29 AM Jiri Pirko <jiri@...nulli.us> wrote:
> >>
> >> Thu, Jun 22, 2023 at 10:14:56AM CEST, edumazet@...gle.com wrote:
> >> >On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@...nulli.us> wrote:
> >> >>
> >> >> Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@...gle.com wrote:
> >> >> >syzbot reported a possible deadlock in netlink_set_err() [1]
> >> >> >
> >> >> >A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
> >> >> >for netlink_lock_table()") in netlink_lock_table()
> >> >> >
> >> >> >This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
> >> >> >which were not covered by cited commit.
> >> >> >
> >> >> >[1]
> >> >> >
> >> >> >WARNING: possible irq lock inversion dependency detected
> >> >> >6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
> >> >> >
> >> >> >syz-executor.2/23011 just changed the state of lock:
> >> >> >ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
> >> >> >but this lock was taken by another, SOFTIRQ-safe lock in the past:
> >> >> > (&local->queue_stop_reason_lock){..-.}-{2:2}
> >> >> >
> >> >> >and interrupts could create inverse lock ordering between them.
> >> >> >
> >> >> >other info that might help us debug this:
> >> >> > Possible interrupt unsafe locking scenario:
> >> >> >
> >> >> >       CPU0                    CPU1
> >> >> >       ----                    ----
> >> >> >  lock(nl_table_lock);
> >> >> >                               local_irq_disable();
> >> >> >                               lock(&local->queue_stop_reason_lock);
> >> >> >                               lock(nl_table_lock);
> >> >> >  <Interrupt>
> >> >> >    lock(&local->queue_stop_reason_lock);
> >> >> >
> >> >> > *** DEADLOCK ***
> >> >> >
> >> >> >Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
> >> >>
> >> >> I don't think that this "fixes" tag is correct. The referenced commit
> >> >> is a fix to the same issue on a different codepath, not the one who
> >> >> actually introduced the issue.
> >> >>
> >> >> The code itself looks fine to me.
> >> >
> >> >Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.
> >>
> >> I'm aware it didn't. But that does not implicate this patch should have
> >> that commit as a "Fixes:" tag. Either have the correct one pointing out
> >> which commit introduced the issue or omit the "Fixes:" tag entirely.
> >> That's my point.
> >
> >My point is that the cited commit should have fixed all points
> >where the nl_table_lock was read locked.
>
> Yeah, it was incomplete. I agree. I don't argue with that.
>
> >
> >When we do locking changes, we have to look at the whole picture,
> >not the precise point where lockdep complained.
> >
> >For instance, this is the reason this patch also changes  __netlink_diag_dump(),
> >even if the report had nothing to do about it yet.
> >
> >So this Fixes: tag is fine, thank you.
>
> Then we have to agree to disagree I guess. It is not fine.
>
> Quoting from Documentation/process/handling-regressions.rst:
>   Add a "Fixes:" tag to specify the commit causing the regression.
>
> Quoting from Documentation/process/submitting-patches.rst:
>   A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
>   is used to make it easy to determine where a bug originated, which can help
>   review a bug fix.

The previous commit definitely had an issue, because it was not complete.

We have many other cases of commits that complete the work started earlier.

I will continue doing so, and I will continue writing changelogs that
displease you.

>
> 1d482e666b8e is not causing any regression (to be known of), definitelly
> not the one this patch is fixing.

>
> Misusing "Fixes" tag like this only adds unnecessary confusion.
> That is my point from the beginning. I don't understand your resistance
> to be honest.

To be honest, I do not understand you either, I suggest we move on,
we obviously are not on the same page.