lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <d7f38867-71d3-e91c-c71c-1dd37a4c3086@elloe.vision>
Date:   Wed, 3 Feb 2021 11:55:25 +0000
From:   "Tj (Elloe Linux)" <ml.linux@...oe.vision>
To:     netdev <netdev@...r.kernel.org>
Cc:     Callum O'Connor <callum.oconnor@...oe.vision>
Subject: kernel BUG at net/core/skbuff.c:109!

On a recent build (5.10.0) we've seen several hard-to-pinpoint complete
lock-ups requiring power-off restarts.

Today we found a small clue in the kernel log but unfortunately the
complete backtrace wasn't captured (presumably system froze before log
could be flushed) but I thought I should share it for investigation.

kernel BUG at net/core/skbuff.c:109!

kernel: skbuff: skb_under_panic: text:ffffffffc103c622 len:1228 put:48
head:ffffa00202858000 data:ffffa00202857ff2 tail:0x4be end:0x6c0 dev:wlp4s0
kernel: ------------[ cut here ]------------
kernel: kernel BUG at net/core/skbuff.c:109!

Obviously this ought not to happen and we'd like to discover the cause.

Whilst writing this report it happened again. Checking the logs we see
three instances of the BUG none of which capture a stack trace:

Jan 27
Feb 03 #1
Feb 03 #2

The only slight clue may be a k3s service that we were unaware was
constantly restarting and had reached 26,636 iterations just before the
Feb 03 #1 BUG. However, we removed k3s immediately after and there were
no similar clues 20 minutes later for the Feb 03 #2 BUG.

Feb 03 11:11:13 elloe001 k3s[1209978]:
time="2021-02-03T11:11:13.452745479Z" level=fatal msg="starting
kubernetes: preparing server: start cluster and https:
listen tcp 10.1.2.1:6443: bind: cannot assign requested address"
Feb 03 11:11:13 elloe001 systemd[1]: k3s-main.service: Main process
exited, code=exited, status=1/FAILURE
Feb 03 11:11:13 elloe001 systemd[1]: k3s-main.service: Failed with
result 'exit-code'.
Feb 03 11:11:13 elloe001 systemd[1]: Failed to start Lightweight Kubernetes.
Feb 03 11:11:18 elloe001 systemd[1]: k3s-dev.service: Scheduled restart
job, restart counter is at 26636.
Feb 03 11:11:18 elloe001 systemd[1]: k3s-main.service: Scheduled restart
job, restart counter is at 26636.
Feb 03 11:11:18 elloe001 systemd[1]: Stopped Lightweight Kubernetes.
Feb 03 11:11:18 elloe001 systemd[1]: Starting Lightweight Kubernetes...
Feb 03 11:11:18 elloe001 systemd[1]: Stopped Lightweight Kubernetes.
Feb 03 11:11:18 elloe001 systemd[1]: Starting Lightweight Kubernetes...

We don't think this is hardware related as we have several identical
Lenovo E495 laptops and they have never suffered this.

We don't know of any way to reproduce it at will.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ