linux-ext4 - 100% cpu kworker ext4-fs kernel 4.14.47

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <d4a0d1f6-2b39-77c3-2704-fc4d1d832c47@cb-world.de>
Date:   Sat, 2 Jun 2018 17:37:51 +0200
From:   "Administrator www.cb-world.de" <admin@...world.de>
To:     linux-ext4@...r.kernel.org
Subject: 100% cpu kworker ext4-fs kernel 4.14.47

Hi,

yesterday I noticed a kworker-process with 100. As I shut down my
PC the shutdown-process hangs at one filesystem (/srv/server) so I
cut power.
This morining I turned the pc on again, filesystem was checked without
errors and mounted normaly.

At 12:45 a scripts starts to backup a server with rsync to that
filesystem. After that there again was a kworker-process with 100%
CPU.

Searching about kworker I found "echo l > /proc/sysrq-trigger" to
identify why the process is running at 100%.

Backtrace is:
25127.566574] CPU: 2 PID: 6881 Comm: kworker/u12:2 Not tainted
4.14.47-gentoo #1
[25127.566575] Hardware name: System manufacturer System Product
Name/M4A89GTD-PRO/USB3, BIOS 2301    07/18/2011
[25127.566580] Workqueue: writeback wb_workfn (flush-9:11)
[25127.566582] task: ffff880403a46e40 task.stack: ffffc90004378000
[25127.566586] RIP: 0010:radix_tree_next_chunk+0xc6/0x2d0
[25127.566586] RSP: 0018:ffffc9000437b920 EFLAGS: 00000202
[25127.566588] RAX: 0000000000000000 RBX: 0000000000000010 RCX:
ffff8802126ef901
[25127.566588] RDX: 0000000000000012 RSI: ffffc9000437b968 RDI:
ffff88021fd6a6f8
[25127.566589] RBP: 0000000000000000 R08: ffff880002174921 R09:
ffff8802126ef908
[25127.566590] R10: 0000000000000012 R11: 00000000000001c3 R12:
0000000000000228
[25127.566590] R13: 0000000000000040 R14: 0000000000000000 R15:
0000000000000000
[25127.566591] FS:  0000000000000000(0000) GS:ffff88041fc80000(0000)
knlGS:0000000000000000
[25127.566592] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25127.566593] CR2: 00007fecd6880000 CR3: 000000000220a000 CR4:
00000000000006e0
[25127.566593] Call Trace:
[25127.566599]  find_get_pages_tag+0x128/0x1d0
[25127.566602]  pagevec_lookup_tag+0x18/0x20
[25127.566605]  mpage_prepare_extent_to_map+0xb6/0x2b0
[25127.566607]  ? jbd2_journal_stop+0x18a/0x3d0
[25127.566610]  ? kmem_cache_alloc+0x1ac/0x1c0
[25127.566611]  ? kmem_cache_alloc+0x1ac/0x1c0
[25127.566613]  ext4_writepages+0x426/0xe10
[25127.566616]  ? cpumask_next_and+0x26/0x40
[25127.566617]  ? do_writepages+0x12/0x60
[25127.566618]  do_writepages+0x12/0x60
[25127.566620]  __writeback_single_inode+0x41/0x310
[25127.566621]  writeback_sb_inodes+0x240/0x4a0
[25127.566623]  __writeback_inodes_wb+0x82/0xb0
[25127.566624]  wb_writeback+0x259/0x2e0
[25127.566626]  ? wb_workfn+0x284/0x330
[25127.566627]  wb_workfn+0x284/0x330
[25127.566629]  process_one_work+0x1ae/0x3d0
[25127.566631]  worker_thread+0x42/0x3e0
[25127.566632]  kthread+0xf7/0x130
[25127.566634]  ? trace_event_raw_event_workqueue_execute_start+0x80/0x80
[25127.566635]  ? kthread_create_on_node+0x40/0x40
[25127.566637]  ? SyS_exit+0xe/0x10
[25127.566639]  ret_from_fork+0x22/0x40
[25127.566641] Code: 83 e0 3f 85 db 49 8d 4c c1 28 4c 8b 01 0f 84 1b 01
00 00 4b 0f a3 04 21 0f 93 c1 84 c9 74 7c 85 ed 75 48 85 db 0f 84 0e 01
00 00 <48> 83 c0 01 48 83 f8 40 0f 84 24 01 00 00 4f 8b 3c 21 89 c1 49

So seems to ext4. I than tried to stop all process using that filesystem
(only rsync) but kill -9 did not stop them.
First thought perhaps the disks are broken but smartctl says no errors
logged for both devices (/srv/server is on a mdadm-raid1 with all
devices up) and also on the same disks there is another mdadm-raid1
with another filesystem (/sic) wich performs without any problems with
high load today (bacula-backup)

I also can't find any hint in any logfile and as fsck runs without
errors on that filesystem and searching the internet gives no
information my only hope is to get some hints at this list what could
be the reason for this problem and how I can solve this.

Thanks for any help

Carsten