linux-kernel - Re: Linux 5.12-rc7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89i+wQoaiFEe1Qi1k96d-ACLmAtJJQ36bs5Z5knYO1v+rOg@mail.gmail.com>
Date:   Mon, 12 Apr 2021 19:38:36 +0200
From:   Eric Dumazet <edumazet@...gle.com>
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>
Subject: Re: Linux 5.12-rc7

On Mon, Apr 12, 2021 at 7:31 PM Guenter Roeck <linux@...ck-us.net> wrote:
>
> On 4/12/21 9:31 AM, Eric Dumazet wrote:
> > On Mon, Apr 12, 2021 at 6:28 PM Linus Torvalds
> > <torvalds@...ux-foundation.org> wrote:
> >>
> >> On Sun, Apr 11, 2021 at 10:14 PM Guenter Roeck <linux@...ck-us.net> wrote:
> >>>
> >>> Qemu test results:
> >>>         total: 460 pass: 459 fail: 1
> >>> Failed tests:
> >>>         sh:rts7751r2dplus_defconfig:ata:net,virtio-net:rootfs
> >>>
> >>> The failure bisects to commit 0f6925b3e8da ("virtio_net: Do not pull payload in
> >>> skb->head"). It is a spurious problem - the test passes roughly every other
> >>> time. When the failure is seen, udhcpc fails to get an IP address and aborts
> >>> with SIGTERM. So far I have only seen this with the "sh" architecture.
> >>
> >> Hmm. Let's add in some more of the people involved in that commit, and
> >> also netdev.
> >>
> >> Nothing in there looks like it should have any interaction with
> >> architecture, so that "it happens on sh" sounds odd, but maybe it's
> >> some particular interaction with the qemu environment.
> >
> > Yes, maybe.
> >
> > I spent few hours on this, and suspect a buggy memcpy() implementation
> > on SH, but this was not conclusive.
> >
>
> I replaced all memcpy() calls in skbuff.h with calls to
>
> static inline void __my_memcpy(unsigned char *to, const unsigned char *from,
>                                unsigned int len)
> {
>        while (len--)
>                *to++ = *from++;
> }
>
> That made no difference, so unless you have some other memcpy() in mind that
> seems to be unlikely.


Sure, note I also had :

diff --git a/net/core/dev.c b/net/core/dev.c
index af8c1ea040b9364b076e2d72f04dc3de2d7e2f11..4e05a32542dd606aaaaee8038017fea949939c0e
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5938,7 +5938,7 @@ static void gro_pull_from_frag0(struct sk_buff
*skb, int grow)

        BUG_ON(skb->end - skb->tail < grow);

-       memcpy(skb_tail_pointer(skb), NAPI_GRO_CB(skb)->frag0, grow);
+       memmove(skb_tail_pointer(skb), NAPI_GRO_CB(skb)->frag0, grow);

        skb->data_len -= grow;
        skb->tail += grow;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c421c8f809256f7b13a8b5a1331108449353ee2d..41796dedfa9034f2333cf249a0d81b7250e14d1f
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2278,7 +2278,7 @@ int skb_copy_bits(const struct sk_buff *skb, int
offset, void *to, int len)
                                              skb_frag_off(f) + offset - start,
                                              copy, p, p_off, p_len, copied) {
                                vaddr = kmap_atomic(p);
-                               memcpy(to + copied, vaddr + p_off, p_len);
+                               memmove(to + copied, vaddr + p_off, p_len);
                                kunmap_atomic(vaddr);
                        }


>
> > By pulling one extra byte, the problem goes away.
> >
> > Strange thing is that the udhcpc process does not go past sendto().
> >
>
> I have been trying to debug that one. Unfortunately gdb doesn't work with sh,
> so I can't use it to debug the problem. I'll spend some more time on this today.

Yes, I think this is the real issue here. This smells like some memory
corruption.

In my traces, packet is correctly received in AF_PACKET queue.

I have checked the skb is well formed.

But the user space seems to never call poll() and recvmsg() on this
af_packet socket.