lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bug-217965-13602-0pfvbRXyV2@https.bugzilla.kernel.org/>
Date: Mon, 18 Mar 2024 04:45:26 +0000
From: bugzilla-daemon@...nel.org
To: linux-ext4@...r.kernel.org
Subject: [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

https://bugzilla.kernel.org/show_bug.cgi?id=217965

--- Comment #69 from Eyal Lebedinsky (bugzilla@...l.emu.id.au) ---
Seeing this last comment (#68), I wondered if the fix is included in my latest
kernel. Or do I have a different problem?

After just updating my fedora 38 to kernel 6.7.9 I did NOT do the remount and
repeated my original problematic test.
    $ uname -s
    Linux e7.eyal.emu.id.au 6.7.9-100.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Wed
Mar  6 19:31:16 UTC 2024 x86_64 GNU/Linux
    $ mount|grep data1
    /dev/md127 on /data1 type ext4 (rw,noatime,stripe=640)

I rsync a directory (262GB, 8,911,139 files) from an SSD to a raid6 (7x12TB =>
60TB fs). Both are ext4.
    $ sudo time rsync -aHSK --stats --progress
/data/no-backup/old-backups/tapes /data1/no-backup/really-not/old-backups/

What I see is that once meminfo shows the limit (4GB) was reached, the kworker
kicks in at 100% CPU.
At that point iostat shows activity on the array dropping, from about 160MB/s
to very little (with an occasional burst of a few MB/s).

```
10:12:33 2024-03-18
10:12:33 Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s   
kB_read    kB_wrtn    kB_dscd
[trimmed]
10:17:13 md127             0.00         0.00         0.00         0.00         
0          0          0
10:17:23 md127            34.40       137.60         0.00         0.00      
1376          0          0  <<< rsync starts
10:17:33 md127          2346.30       747.20      8638.00         0.00      
7472      86380          0
10:17:43 md127          7067.10       431.60    133644.40         0.00      
4316    1336444          0
10:17:53 md127          1692.80       578.80      7015.20         0.00      
5788      70152          0
10:18:03 md127          2439.20       169.60     32071.20         0.00      
1696     320712          0
10:18:13 md127           274.00         4.00      2242.00         0.00        
40      22420          0
10:18:23 md127          3172.70        17.60     56828.00         0.00       
176     568280          0
10:18:33 md127           416.20         0.80      1664.80         0.00         
8      16648          0
10:18:43 md127            18.70         0.40        76.40         0.00         
4        764          0
10:18:53 md127             6.50         0.00        30.80         0.00         
0        308          0
10:19:03 md127             4.80         0.00        40.00         0.00         
0        400          0
10:19:13 md127             5.70         0.00        63.60         0.00         
0        636          0
10:19:23 md127             2.60         0.00        54.80         0.00         
0        548          0
10:19:33 md127             7.40         0.00       243.20         0.00         
0       2432          0
10:19:43 md127             5.20         0.00        75.60         0.00         
0        756          0
10:19:53 md127             3.80         0.00        20.40         0.00         
0        204          0
10:20:03 md127             2.00         0.00        13.20         0.00         
0        132          0
10:20:13 md127             3.90         0.00        29.20         0.00         
0        292          0
10:20:23 md127             3.80         0.00        19.60         0.00         
0        196          0

At the same time meminfo shows:
2024-03-18 10:17:04 Dirty:     11220 kB  Buffers:    829988 kB  MemFree:   
670576 kB
2024-03-18 10:17:14 Dirty:     10784 kB  Buffers:    830016 kB  MemFree:   
631500 kB
2024-03-18 10:17:24 Dirty:    750616 kB  Buffers:    875592 kB  MemFree:   
654236 kB
2024-03-18 10:17:34 Dirty:   2757048 kB  Buffers:    972948 kB  MemFree:   
600636 kB
2024-03-18 10:17:44 Dirty:   2855196 kB  Buffers:   1046736 kB  MemFree:   
551940 kB
2024-03-18 10:17:54 Dirty:   4104524 kB  Buffers:   1127200 kB  MemFree:   
538136 kB
2024-03-18 10:18:04 Dirty:   4390504 kB  Buffers:   1155588 kB  MemFree:   
600828 kB
2024-03-18 10:18:14 Dirty:   4518280 kB  Buffers:   1161916 kB  MemFree:   
580176 kB
2024-03-18 10:18:24 Dirty:   4356952 kB  Buffers:   1185872 kB  MemFree:   
543072 kB
2024-03-18 10:18:34 Dirty:   4559504 kB  Buffers:   1196396 kB  MemFree:   
518872 kB
2024-03-18 10:18:44 Dirty:   4567212 kB  Buffers:   1197060 kB  MemFree:   
606572 kB
2024-03-18 10:18:54 Dirty:   4567592 kB  Buffers:   1197084 kB  MemFree:   
611440 kB
... and stays there until ...

I then killed the copy (14,296MB copied). The writing to the array remained
very low, the kernel thread stayed at 100%
and meminfo drained very slowly. Access to the array is now slow with some
hiccups.

2024-03-18 10:35:24 Dirty:   4484720 kB  Buffers:   4984308 kB  MemFree:   
820532 kB   <<< rsync killed
2024-03-18 10:35:34 Dirty:   4484436 kB  Buffers:   4984348 kB  MemFree:   
851288 kB
2024-03-18 10:35:44 Dirty:   4483992 kB  Buffers:   4984368 kB  MemFree:   
817516 kB
2024-03-18 10:35:54 Dirty:   4483780 kB  Buffers:   4984400 kB  MemFree:   
803156 kB
2024-03-18 10:36:04 Dirty:   4483704 kB  Buffers:   4984460 kB  MemFree:   
809956 kB
2024-03-18 10:36:14 Dirty:   4479416 kB  Buffers:   4984496 kB  MemFree:   
832980 kB
2024-03-18 10:36:24 Dirty:   4474312 kB  Buffers:   4984528 kB  MemFree:   
881464 kB
2024-03-18 10:36:34 Dirty:   4474260 kB  Buffers:   4984568 kB  MemFree:   
840444 kB
2024-03-18 10:36:44 Dirty:   4474132 kB  Buffers:   4984600 kB  MemFree:   
843524 kB
2024-03-18 10:36:54 Dirty:   4474292 kB  Buffers:   4984640 kB  MemFree:   
841004 kB
2024-03-18 10:37:04 Dirty:   4474052 kB  Buffers:   4984680 kB  MemFree:   
834148 kB
2024-03-18 10:37:14 Dirty:   4473688 kB  Buffers:   4984712 kB  MemFree:   
853200 kB
2024-03-18 10:37:24 Dirty:   4473448 kB  Buffers:   4984752 kB  MemFree:   
782540 kB
2024-03-18 10:37:34 Dirty:   4473288 kB  Buffers:   4984776 kB  MemFree:   
786100 kB
2024-03-18 10:37:44 Dirty:   3871768 kB  Buffers:   4984972 kB  MemFree:   
846020 kB
2024-03-18 10:37:54 Dirty:   3871612 kB  Buffers:   4985020 kB  MemFree:   
826664 kB
2024-03-18 10:38:04 Dirty:   3871736 kB  Buffers:   4985052 kB  MemFree:   
826084 kB
2024-03-18 10:38:14 Dirty:   3871184 kB  Buffers:   4985100 kB  MemFree:   
876572 kB
2024-03-18 10:38:24 Dirty:   3870936 kB  Buffers:   4985140 kB  MemFree:   
918944 kB
2024-03-18 10:38:34 Dirty:   3648080 kB  Buffers:   4985256 kB  MemFree:   
901336 kB
2024-03-18 10:38:44 Dirty:   3556612 kB  Buffers:   4985316 kB  MemFree:   
902532 kB
2024-03-18 10:38:54 Dirty:   3551636 kB  Buffers:   4985364 kB  MemFree:   
837816 kB
2024-03-18 10:39:04 Dirty:   3551968 kB  Buffers:   4985468 kB  MemFree:   
823392 kB
2024-03-18 10:39:14 Dirty:   2835648 kB  Buffers:   4985656 kB  MemFree:   
629428 kB
...
2024-03-18 11:05:25 Dirty:   2737096 kB  Buffers:   4993860 kB  MemFree:   
599424 kB   <<< 30m later
2024-03-18 11:35:25 Dirty:   2573748 kB  Buffers:   5001184 kB  MemFree:   
612288 kB   <<< again
2024-03-18 12:05:26 Dirty:   2432572 kB  Buffers:   5007704 kB  MemFree:   
663928 kB   <<< again
2024-03-18 12:35:27 Dirty:   2145348 kB  Buffers:   3707492 kB  MemFree:   
588464 kB   <<< again
2024-03-18 13:05:27 Dirty:   2017848 kB  Buffers:   3718936 kB  MemFree:   
585500 kB   <<< again
2024-03-18 13:35:28 Dirty:   1822436 kB  Buffers:   3746824 kB  MemFree:   
565560 kB   <<< again
2024-03-18 14:05:29 Dirty:   1595088 kB  Buffers:   3799124 kB  MemFree:   
544504 kB   <<< again
2024-03-18 14:35:29 Dirty:   1498416 kB  Buffers:   3816868 kB  MemFree:  
3883524 kB   <<< again
2024-03-18 15:05:30 Dirty:   1387140 kB  Buffers:   3835824 kB  MemFree:  
3266060 kB   <<< again
...
2024-03-18 15:32:51 Dirty:   1284940 kB  Buffers:   3850936 kB  MemFree:  
3088904 kB   <<< finally
2024-03-18 15:33:01 Dirty:    933268 kB  Buffers:   3851144 kB  MemFree:  
3098840 kB
2024-03-18 15:33:11 Dirty:     51956 kB  Buffers:   3851248 kB  MemFree:  
3095456 kB
2024-03-18 15:33:21 Dirty:     51968 kB  Buffers:   3851284 kB  MemFree:  
3059212 kB
2024-03-18 15:33:31 Dirty:     52032 kB  Buffers:   3851308 kB  MemFree:  
3085352 kB
2024-03-18 15:33:41 Dirty:       172 kB  Buffers:   3851336 kB  MemFree:  
3090912 kB
2024-03-18 15:33:51 Dirty:        64 kB  Buffers:   3851368 kB  MemFree:  
3030584 kB
```

So over 5 hours to copy this small part (14GB of 262GB) of the data.

Is this expected? Is this a fundamental "feature" of ext4? Or of raid6?
When I did do "sudo mount -o remount,stripe=0 /data1" the copy progressed
nicely with good
writing speed.

I have logs of the progress of this test at 10s intervals.

Regards

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ