netdev - kernel BUG at net/core/skbuff.c:109!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <d7f38867-71d3-e91c-c71c-1dd37a4c3086@elloe.vision>
Date:   Wed, 3 Feb 2021 11:55:25 +0000
From:   "Tj (Elloe Linux)" <ml.linux@...oe.vision>
To:     netdev <netdev@...r.kernel.org>
Cc:     Callum O'Connor <callum.oconnor@...oe.vision>
Subject: kernel BUG at net/core/skbuff.c:109!

On a recent build (5.10.0) we've seen several hard-to-pinpoint complete
lock-ups requiring power-off restarts.

Today we found a small clue in the kernel log but unfortunately the
complete backtrace wasn't captured (presumably system froze before log
could be flushed) but I thought I should share it for investigation.

kernel BUG at net/core/skbuff.c:109!

kernel: skbuff: skb_under_panic: text:ffffffffc103c622 len:1228 put:48
head:ffffa00202858000 data:ffffa00202857ff2 tail:0x4be end:0x6c0 dev:wlp4s0
kernel: ------------[ cut here ]------------
kernel: kernel BUG at net/core/skbuff.c:109!

Obviously this ought not to happen and we'd like to discover the cause.

Whilst writing this report it happened again. Checking the logs we see
three instances of the BUG none of which capture a stack trace:

Jan 27
Feb 03 #1
Feb 03 #2

The only slight clue may be a k3s service that we were unaware was
constantly restarting and had reached 26,636 iterations just before the
Feb 03 #1 BUG. However, we removed k3s immediately after and there were
no similar clues 20 minutes later for the Feb 03 #2 BUG.

Feb 03 11:11:13 elloe001 k3s[1209978]:
time="2021-02-03T11:11:13.452745479Z" level=fatal msg="starting
kubernetes: preparing server: start cluster and https:
listen tcp 10.1.2.1:6443: bind: cannot assign requested address"
Feb 03 11:11:13 elloe001 systemd[1]: k3s-main.service: Main process
exited, code=exited, status=1/FAILURE
Feb 03 11:11:13 elloe001 systemd[1]: k3s-main.service: Failed with
result 'exit-code'.
Feb 03 11:11:13 elloe001 systemd[1]: Failed to start Lightweight Kubernetes.
Feb 03 11:11:18 elloe001 systemd[1]: k3s-dev.service: Scheduled restart
job, restart counter is at 26636.
Feb 03 11:11:18 elloe001 systemd[1]: k3s-main.service: Scheduled restart
job, restart counter is at 26636.
Feb 03 11:11:18 elloe001 systemd[1]: Stopped Lightweight Kubernetes.
Feb 03 11:11:18 elloe001 systemd[1]: Starting Lightweight Kubernetes...
Feb 03 11:11:18 elloe001 systemd[1]: Stopped Lightweight Kubernetes.
Feb 03 11:11:18 elloe001 systemd[1]: Starting Lightweight Kubernetes...

We don't think this is hardware related as we have several identical
Lenovo E495 laptops and they have never suffered this.

We don't know of any way to reproduce it at will.