[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210119134023.082577ca@gollum>
Date: Tue, 19 Jan 2021 13:40:23 +0100
From: Juerg Haefliger <juerg.haefliger@...onical.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Heiner Kallweit <hkallweit1@...il.com>,
Eric Dumazet <eric.dumazet@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
UNGLinuxDriver@...rochip.com,
Woojung Huh <woojung.huh@...rochip.com>
Subject: Re: [Bug 209423] WARN_ON_ONCE() at rtl8169_tso_csum_v2()
On Thu, 8 Oct 2020 20:50:28 +0200
Eric Dumazet <edumazet@...gle.com> wrote:
> On Thu, Oct 8, 2020 at 8:42 PM Heiner Kallweit <hkallweit1@...il.com> wrote:
> >
> > On 08.10.2020 19:15, Eric Dumazet wrote:
> > > On Thu, Oct 8, 2020 at 6:37 PM Heiner Kallweit <hkallweit1@...il.com> wrote:
> > >>
> > >> On 02.10.2020 13:48, Eric Dumazet wrote:
> > >>> On Fri, Oct 2, 2020 at 1:09 PM Heiner Kallweit <hkallweit1@...il.com> wrote:
> > >>>>
> > >>>> On 02.10.2020 10:46, Eric Dumazet wrote:
> > >>>>> On Fri, Oct 2, 2020 at 10:32 AM Eric Dumazet <eric.dumazet@...il.com> wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On 10/2/20 10:26 AM, Eric Dumazet wrote:
> > >>>>>>> On Thu, Oct 1, 2020 at 10:34 PM Heiner Kallweit <hkallweit1@...il.com> wrote:
> > >>>>>>>>
> > >>>>>>>> I have a problem with the following code in ndo_start_xmit() of
> > >>>>>>>> the r8169 driver. A user reported the WARN being triggered due
> > >>>>>>>> to gso_size > 0 and gso_type = 0. The chip supports TSO(6).
> > >>>>>>>> The driver is widely used, therefore I'd expect much more such
> > >>>>>>>> reports if it should be a common problem. Not sure what's special.
> > >>>>>>>> My primary question: Is it a valid use case that gso_size is
> > >>>>>>>> greater than 0, and no SKB_GSO_ flag is set?
> > >>>>>>>> Any hint would be appreciated.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> Maybe this is not a TCP packet ? But in this case GSO should have taken place.
> > >>>>>>>
> > >>>>>>> You might add a
> > >>>>>>> pr_err_once("gso_type=%x\n", shinfo->gso_type);
> > >>>>>>>
> > >>>>>
> > >>>>>>
> > >>>>>> Ah, sorry I see you already printed gso_type
> > >>>>>>
> > >>>>>> Must then be a bug somewhere :/
> > >>>>>
> > >>>>>
> > >>>>> napi_reuse_skb() does :
> > >>>>>
> > >>>>> skb_shinfo(skb)->gso_type = 0;
> > >>>>>
> > >>>>> It does _not_ clear gso_size.
> > >>>>>
> > >>>>> I wonder if in some cases we could reuse an skb while gso_size is not zero.
> > >>>>>
> > >>>>> Normally, we set it only from dev_gro_receive() when the skb is queued
> > >>>>> into GRO engine (status being GRO_HELD)
> > >>>>>
> > >>>> Thanks Eric. I'm no expert that deep in the network stack and just wonder
> > >>>> why napi_reuse_skb() re-initializes less fields in shinfo than __alloc_skb().
> > >>>> The latter one does a
> > >>>> memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> > >>>>
> > >>>
> > >>> memset() over the whole thing is more expensive.
> > >>>
> > >>> Here we know the prior state of some fields, while __alloc_skb() just
> > >>> got a piece of memory with random content.
> > >>>
> > >>>> What I can do is letting the affected user test the following.
> > >>>>
> > >>>> diff --git a/net/core/dev.c b/net/core/dev.c
> > >>>> index 62b06523b..8e75399cc 100644
> > >>>> --- a/net/core/dev.c
> > >>>> +++ b/net/core/dev.c
> > >>>> @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
> > >>>>
> > >>>> skb->encapsulation = 0;
> > >>>> skb_shinfo(skb)->gso_type = 0;
> > >>>> + skb_shinfo(skb)->gso_size = 0;
> > >>>> skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
> > >>>> skb_ext_reset(skb);
> > >>>>
> > >>>
> > >>> As I hinted, this should not be needed.
> > >>>
> > >>> For debugging purposes, I would rather do :
> > >>>
> > >>> BUG_ON(skb_shinfo(skb)->gso_size);
> > >>>
> > >>
> > >> We did the following for debugging:
> > >>
> > >> diff --git a/net/core/dev.c b/net/core/dev.c
> > >> index 62b06523b..4c943b774 100644
> > >> --- a/net/core/dev.c
> > >> +++ b/net/core/dev.c
> > >> @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
> > >> {
> > >> u16 gso_segs = skb_shinfo(skb)->gso_segs;
> > >>
> > >> + if (!skb_shinfo(skb)->gso_type)
> > >> + skb_warn_bad_offload(skb);
> > >
> > > You also want to get a stack trace here, to give us the call graph.
> > >
> >
> > Here it comes, full story is in https://bugzilla.kernel.org/show_bug.cgi?id=209423
> >
> >
> > [236222.967498] ------------[ cut here ]------------
> > [236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000)
> > [236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0
> > [236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
> > [236222.967776] ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video
> > [236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1
> > [236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020
> > [236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0
> > [236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25
> > [236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282
> > [236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000
> > [236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300
> > [236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720
> > [236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729
> > [236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac
> > [236222.968000] FS: 0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000
> > [236222.968013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0
> > [236222.968035] Call Trace:
> > [236222.968047] <IRQ>
> > [236222.968064] netif_skb_features+0x25e/0x2c0
> > [236222.968084] ? ipt_do_table+0x333/0x600 [ip_tables]
> > [236222.968098] validate_xmit_skb+0x1d/0x300
> > [236222.968111] validate_xmit_skb_list+0x48/0x70
> > [236222.968126] sch_direct_xmit+0x129/0x2f0
> > [236222.968140] __dev_queue_xmit+0x710/0x8a0
> > [236222.968184] ? nf_confirm+0xcb/0xf0 [nf_conntrack]
> > [236222.968200] ? nf_hook_slow+0x3f/0xb0
> > [236222.968214] ip_finish_output2+0x2ad/0x560
> > [236222.968229] __netif_receive_skb_core+0x4f0/0xf40
> > [236222.968244] ? packet_rcv+0x44/0x490
> > [236222.968257] __netif_receive_skb_one_core+0x2d/0x70
> > [236222.968277] process_backlog+0x96/0x160
> > [236222.968290] net_rx_action+0x13c/0x3e0
> > [236222.968312] ? usbnet_bh+0x24/0x2b0 [usbnet]
> > [236222.968327] __do_softirq+0xd9/0x2c4
> > [236222.968340] asm_call_on_stack+0x12/0x20
> > [236222.968350] </IRQ>
> > [236222.968362] do_softirq_own_stack+0x39/0x50
> > [236222.968376] irq_exit_rcu+0xc2/0x100
> > [236222.968389] common_interrupt+0x75/0x140
> > [236222.968405] asm_common_interrupt+0x1e/0x40
> > [236222.968427] RIP: 0010:native_safe_halt+0xe/0x10
> > [236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00
> > [236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246
> > [236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
> > [236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00
> > [236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006
> > [236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064
> > [236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
> > [236222.968535] acpi_safe_halt+0x1b/0x30
> > [236222.968549] acpi_idle_enter+0x27e/0x2e0
> > [236222.968566] cpuidle_enter_state+0x81/0x3f0
> > [236222.968589] cpuidle_enter+0x29/0x40
> > [236222.968602] do_idle+0x1d5/0x2a0
> > [236222.968615] cpu_startup_entry+0x19/0x20
> > [236222.968628] start_kernel+0x7f4/0x804
> > [236222.968645] secondary_startup_64+0xb6/0xc0
> > [236222.968659] ---[ end trace 8a4d7f639ad88505 ]---
> >
> >
>
> OK, it would be nice to know what is the input interface
>
> if4 -> look at "ip link | grep 4:"
>
> Then identifying the driver that built such a strange packet (32000
> bytes allocated in skb->head)
>
> ethtool -i ifname
>
>
>
> > >
> > >> +
> > >> if (gso_segs > dev->gso_max_segs)
> > >> return features & ~NETIF_F_GSO_MASK;
> > >>
> > >> Following skb then triggered the skb_warn_bad_offload. Not sure whether this helps
> > >> to find out where in the network stack something goes wrong.
> > >>
> > >>
> > >> [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536
> > >> mac=(778,14) net=(792,20) trans=812
> > >> shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1))
> > >> csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0)
> > >> hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4
> > >> [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2
> > >> [236222.967392] skb linear: 00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00
> > >> [236222.967404] skb linear: 00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8
> > >> [236222.967415] skb linear: 00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18
> > >> [236222.967426] skb linear: 00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea
> > >> [236222.967437] skb linear: 00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a
> > >> [236222.967454] skb linear: 00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77
> > >> [236222.967466] skb linear: 00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8
> > >> [236222.967477] skb linear: 00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63
> > >> [236222.967488] skb linear: 00000080: d2 a1 ab 1f 26 1d
> > >>
> > >>
> > >>
> > >>>
> > >>> Nothing in GRO stack will change gso_size, unless the packet is queued
> > >>> by GRO layer (after this, napi_reuse_skb() wont be called)
> > >>>
> > >>> napi_reuse_skb() is only used when a packet has been aggregated to
> > >>> another, and at this point gso_size should be still 0.
> > >>>
> > >>
I seem to have stumbled over the same or a similar issue with a Raspberry Pi
3B+ running 5.11-rc4 and using the on-board lan78xx USB NIC. The Pi is used
as a gateway. If I enable IP forwarding on the Pi and pound on eth0 [1], I
get tons of the below warnings after a couple of seconds:
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] skb len=54 headroom=5194 headlen=54 tailroom=10816
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] mac=(5194,14) net=(5208,20) trans=5228
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1))
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] csum(0xe505 ip_summed=0 complete_sw=0 valid=0 level=0)
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.744157] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.774147] dev name=eth0 feat=0x0x0000010000114b09
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.779355] skb linear: 00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.787365] skb linear: 00000010: 00 28 00 00 40 00 3f 06 41 d0 c0 a8 63 84 02 14
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.795266] skb linear: 00000020: d3 bf ed 3e 01 bb d4 0f 88 7e 00 00 00 00 50 04
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.803168] skb linear: 00000030: 00 00 6a 58 00 00
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.808384] ------------[ cut here ]------------
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.813200] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000)
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.819717] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.828190] Modules linked in:
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.831354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.838009] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.844478] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.850685] pc : skb_warn_bad_offload+0x84/0x100
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.855464] lr : skb_warn_bad_offload+0x84/0x100
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.860242] sp : ffff800010003850
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.863665] x29: ffff800010003850 x28: ffff7a96fb196290
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.869160] x27: ffff7a96c5958300 x26: 0000000000000001
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.874654] x25: ffffa73eee323000 x24: ffff7a96ee84b000
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.880148] x23: ffffa73eee7f4f00 x22: 0000000000000000
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.885642] x21: ffffa73eee0327e0 x20: ffff7a96ee84b000
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.891136] x19: ffff7a96c5958300 x18: 0000000000000010
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.896630] x17: 0000000000000000 x16: 0000000000000000
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.902123] x15: 000000000000ad55 x14: 0000000000000010
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.907617] x13: 00000000ffffffff x12: ffffa73eedd9d950
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.913109] x11: ffffa73eee885de0 x10: ffffa73eee86dda0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.918603] x9 : ffffa73eecf2f45c x8 : 0000000000017fe8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.924097] x7 : c0000000ffffefff x6 : 0000000000000003
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.929590] x5 : 0000000000000000 x4 : 0000000000000000
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.935081] x3 : 0000000000000100 x2 : 0000000000001000
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.940575] x1 : 0000000000000000 x0 : 0000000000000000
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.946070] Call trace:
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.948599] skb_warn_bad_offload+0x84/0x100
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.953020] netif_skb_features+0x218/0x2a0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.957350] validate_xmit_skb.isra.0+0x28/0x2c8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.962125] validate_xmit_skb_list+0x44/0x98
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.966631] sch_direct_xmit+0xf0/0x3a8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.970599] __qdisc_run+0x140/0x668
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.974297] __dev_queue_xmit+0x59c/0x980
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.978446] dev_queue_xmit+0x1c/0x28
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.982237] ip_finish_output2+0x30c/0x558
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.986476] __ip_finish_output+0xe4/0x260
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.990715] ip_finish_output+0x3c/0xd8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.994683] ip_output+0xb4/0x148
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1914.998116] ip_forward_finish+0x7c/0xc0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.002174] ip_forward+0x42c/0x4f0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.005783] ip_rcv_finish+0x98/0xb8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.009481] ip_rcv+0xe0/0xf0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.012552] __netif_receive_skb_one_core+0x5c/0x88
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.017597] __netif_receive_skb+0x20/0x70
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.021834] process_backlog+0xc0/0x1d0
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.025802] net_rx_action+0x134/0x478
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.029682] __do_softirq+0x130/0x378
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.033472] irq_exit+0xc0/0xe8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.036725] __handle_domain_irq+0x70/0xc8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.040963] bcm2836_arm_irqchip_handle_irq+0x6c/0x80
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.046185] el1_irq+0xb4/0x140
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.053377] arch_cpu_idle+0x18/0x28
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.060981] default_idle_call+0x44/0x178
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.069009] do_idle+0x224/0x270
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.076147] cpu_startup_entry+0x30/0x98
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.083916] rest_init+0xc8/0xd8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.090937] arch_call_rest_init+0x18/0x24
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.098829] start_kernel+0x57c/0x5b8
Jan 19 07:55:22 rpi-3b-plus-rev1d3-abfb kernel: [ 1915.106251] ---[ end trace c3d8dd12ce1805e0 ]---
If I also add the following rule:
$ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
I get a single warning followed by a TX timeout:
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] skb len=66 headroom=5194 headlen=66 tailroom=10804
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] mac=(5194,14) net=(5208,20) trans=5228
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] shinfo(txflags=0 nr_frags=0 gso(size=1448 type=0 segs=1))
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] csum(0xeedb ip_summed=1 complete_sw=0 valid=0 level=0)
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.516888] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.546872] dev name=eth0 feat=0x0x0000010000114b09
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.552060] skb linear: 00000000: e0 28 6d 9e b9 22 b8 27 eb 3e ab fb 08 00 45 00
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.560090] skb linear: 00000010: 00 34 90 99 40 00 3f 06 87 40 c0 a8 63 84 22 6b
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.568019] skb linear: 00000020: dd 52 d0 ac 00 50 35 e0 1e 2c 78 02 47 fa 80 10
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.575921] skb linear: 00000030: 01 f6 d6 96 00 00 01 01 08 0a 50 c9 d7 4b cd 2e
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.583918] skb linear: 00000040: 9f fc
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.588105] ------------[ cut here ]------------
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.592920] lan78xx: caps=(0x0000010000114b09, 0x0000000000000000)
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.599429] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3197 skb_warn_bad_offload+0x84/0x100
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.607900] Modules linked in:
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.611064] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc4 #103
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.617720] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.624189] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.630396] pc : skb_warn_bad_offload+0x84/0x100
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.635175] lr : skb_warn_bad_offload+0x84/0x100
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.639953] sp : ffff800010003810
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.643374] x29: ffff800010003810 x28: ffff50043b196290
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.648870] x27: ffff500407371600 x26: 0000000000000001
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.654365] x25: ffffa1fa11b23000 x24: ffff50042e96b000
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.659859] x23: ffffa1fa11ff4f00 x22: 0000000000000000
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.665353] x21: ffffa1fa118327e0 x20: ffff50042e96b000
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.670847] x19: ffff500407371600 x18: 0000000000000010
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.676340] x17: 0000000000000000 x16: 0000000000000000
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.681833] x15: 000000000000ad55 x14: 0000000000000010
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.687326] x13: 00000000ffffffff x12: ffffa1fa1159d950
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.692819] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.698313] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.703806] x7 : c0000000ffffefff x6 : 0000000000000003
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.709300] x5 : 0000000000000000 x4 : 0000000000000000
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.714791] x3 : 0000000000000100 x2 : 0000000000001000
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.720283] x1 : 0000000000000000 x0 : 0000000000000000
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.725778] Call trace:
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.728306] skb_warn_bad_offload+0x84/0x100
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.732728] netif_skb_features+0x218/0x2a0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.737057] validate_xmit_skb.isra.0+0x28/0x2c8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.741833] validate_xmit_skb_list+0x44/0x98
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.746339] sch_direct_xmit+0xf0/0x3a8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.750309] __qdisc_run+0x140/0x668
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.754008] __dev_queue_xmit+0x59c/0x980
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.758156] dev_queue_xmit+0x1c/0x28
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.761945] neigh_resolve_output+0x108/0x230
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.766450] ip_finish_output2+0x180/0x558
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.770690] __ip_finish_output+0xe4/0x260
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.774928] ip_finish_output+0x3c/0xd8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.778896] ip_output+0xb4/0x148
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.782328] ip_forward_finish+0x7c/0xc0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.786385] ip_forward+0x42c/0x4f0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.789995] ip_rcv_finish+0x98/0xb8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.793694] ip_rcv+0xe0/0xf0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.796765] __netif_receive_skb_one_core+0x5c/0x88
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.801810] __netif_receive_skb+0x20/0x70
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.806047] process_backlog+0xc0/0x1d0
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.810016] net_rx_action+0x134/0x478
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.813897] __do_softirq+0x130/0x378
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.817686] irq_exit+0xc0/0xe8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.820940] __handle_domain_irq+0x70/0xc8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.829099] bcm2836_arm_irqchip_handle_irq+0x6c/0x80
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.838223] el1_irq+0xb4/0x140
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.845371] arch_cpu_idle+0x18/0x28
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.852882] default_idle_call+0x44/0x178
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.860756] do_idle+0x224/0x270
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.867794] cpu_startup_entry+0x30/0x98
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.875516] rest_init+0xc8/0xd8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.882496] arch_call_rest_init+0x18/0x24
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.890352] start_kernel+0x57c/0x5b8
Jan 19 08:15:47 rpi-3b-plus-rev1d3-abfb kernel: [ 81.897706] ---[ end trace a5789410f231a10b ]---
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.046337] ------------[ cut here ]------------
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.054787] NETDEV WATCHDOG: eth0 (lan78xx): transmit queue 0 timed out
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.065356] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x384/0x390
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.077534] Modules linked in:
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.084361] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 5.11.0-rc4 #103
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.096114] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.106246] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.116085] pc : dev_watchdog+0x384/0x390
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.123857] lr : dev_watchdog+0x384/0x390
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.131558] sp : ffff800010013d90
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.138497] x29: ffff800010013d90 x28: 0000000000000140
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.147472] x27: 00000000ffffffff x26: ffffa1fa11b23000
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.156489] x25: 0000000000000002 x24: 0000000000000000
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.165496] x23: 0000000000000001 x22: ffff50042e96b000
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.174494] x21: ffff50042e96b440 x20: ffffa1fa11fe7000
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.183490] x19: 0000000000000000 x18: 0000000000000010
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.192493] x17: 0000000000000000 x16: 0000000000000000
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.201473] x15: 000000000000ad55 x14: 0000000000000010
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.210439] x13: 00000000ffffffff x12: ffffa1fa1159d950
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.219397] x11: ffffa1fa12085de0 x10: ffffa1fa1206dda0
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.228367] x9 : ffffa1fa1072f45c x8 : 0000000000017fe8
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.237362] x7 : c0000000ffffefff x6 : 0000000000000003
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.246353] x5 : 0000000000000000 x4 : 0000000000000000
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.255328] x3 : 0000000000000100 x2 : 0000000000001000
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.264273] x1 : 0000000000000000 x0 : 0000000000000000
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.273192] Call trace:
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.279183] dev_watchdog+0x384/0x390
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.286461] call_timer_fn+0x38/0x188
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.293762] run_timer_softirq+0x494/0x688
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.301489] __do_softirq+0x130/0x378
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.308767] irq_exit+0xc0/0xe8
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.315500] __handle_domain_irq+0x70/0xc8
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.323214] bcm2836_arm_irqchip_handle_irq+0x6c/0x80
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.331940] el1_irq+0xb4/0x140
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.338706] arch_cpu_idle+0x18/0x28
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.345916] default_idle_call+0x44/0x178
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.353577] do_idle+0x224/0x270
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.360433] cpu_startup_entry+0x2c/0x98
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.368000] secondary_start_kernel+0x148/0x180
Jan 19 08:16:15 rpi-3b-plus-rev1d3-abfb kernel: [ 110.376199] ---[ end trace a5789410f231a10c ]---
I did some bisecting and found commit [2] to be problematic. Reverting that
commit plus the two follow-on fixes [3] and [4] prevents the warnings and
timeout. I'm no networking expert so can't determine if [2] is broken or
merely exposes a different underlying issue. I failed to reproduce the problem
using a dedicated Realtek-based USB NIC plugged into the Pi, which points
towards the lan78xx driver/HW being the culprit.
Enabling KASAN didn't trigger any error reports.
Let me know if there's anything else I can try to narrow this down.
...Juerg
[1]
On the Pi, I run:
$ nc -l 1234 | dd status=progress >/dev/null
And on another machine, that is configured to use the Pi as the gateway:
$ nc 192.168.99.115 1234 < /dev/urandom
and a couple of firefox instances that keep opening public URls.
[2]
Author: Eric Dumazet <edumazet@...gle.com>
Date: Tue Nov 27 14:42:03 2018 -0800
tcp: implement coalescing on backlog queue
In case GRO is not as efficient as it should be or disabled,
we might have a user thread trapped in __release_sock() while
softirq handler flood packets up to the point we have to drop.
This patch balances work done from user thread and softirq,
to give more chances to __release_sock() to complete its work
before new packets are added the the backlog.
This also helps if we receive many ACK packets, since GRO
does not aggregate them.
This patch brings ~60% throughput increase on a receiver
without GRO, but the spectacular gain is really on
1000x release_sock() latency reduction I have measured.
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Cc: Neal Cardwell <ncardwell@...gle.com>
Cc: Yuchung Cheng <ycheng@...gle.com>
Acked-by: Neal Cardwell <ncardwell@...gle.com>
Signed-off-by: David S. Miller <davem@...emloft.net>
[3] 86bccd036713 tcp: fix receive window update in tcp_add_backlog()
[4] ca2fe2956ace tcp: add sanity tests in tcp_add_backlog()
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists