netdev - Re: Race Condition Observed in ARP Processing.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpUFCnmu36L0hwrK+xv9gBWKtcq44nOVGNEyR=o9QDx7Pg@mail.gmail.com>
Date:   Thu, 31 Dec 2020 10:53:59 -0800
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Chinmay Agarwal <chinagar@...eaurora.org>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        sharathv@...eaurora.org
Subject: Re: Race Condition Observed in ARP Processing.

On Tue, Dec 29, 2020 at 8:06 AM Chinmay Agarwal <chinagar@...eaurora.org> wrote:
>
> Hi All,
>
> We found a crash while performing some automated stress tests on a 5.4 kernel based device.
>
> We found out that it there is a freed neighbour address which was still part of the gc_list and was leading to crash.
> Upon adding some debugs and checking neigh_put/neigh_hold/neigh_destroy calls stacks, looks like there is a possibility of a Race condition happening in the code.
[...]
> The crash may have been due to out of order ARP replies.
> As neighbour is marked dead should we go ahead with updating our ARP Tables?

I think you are probably right, we should just do unlock and return
in __neigh_update() when hitting if (neigh->dead) branch. Something
like below:

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 9500d28a43b0..0ce592f585c8 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1250,6 +1250,7 @@ static int __neigh_update(struct neighbour
*neigh, const u8 *lladdr,
                goto out;
        if (neigh->dead) {
                NL_SET_ERR_MSG(extack, "Neighbor entry is now dead");
+               new = old;
                goto out;
        }

But given the old state probably contains NUD_PERMANENT, I guess
you hit the following branch instead:

        if (!(flags & NEIGH_UPDATE_F_ADMIN) &&
            (old & (NUD_NOARP | NUD_PERMANENT)))
                goto out;

So we may have to check ->dead before this. Please double check.

This bug is probably introduced by commit 9c29a2f55ec05cc8b525ee.
Can you make a patch and send it out formally after testing?

Thanks!