linux-kernel - Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87jzgg2fqg.fsf@mail.lhotse>
Date: Sat, 17 Aug 2024 09:46:31 +1000
From: Michael Ellerman <mpe@...erman.id.au>
To: Niklas Cassel <cassel@...nel.org>
Cc: Kolbjørn Barmen <linux-ppc@...la.no>,
 linuxppc-dev@...ts.ozlabs.org,
 linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org, Jonáš Vidra
 <vidra@...l.mff.cuni.cz>, Christoph Hellwig <hch@....de>,
 linux@...ck-us.net
Subject: Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in
 drivers/ata/pata_macio.c

Niklas Cassel <cassel@...nel.org> writes:
> On Wed, Aug 14, 2024 at 10:20:55PM +1000, Michael Ellerman wrote:
>> Niklas Cassel <cassel@...nel.org> writes:
>> > On Tue, Aug 13, 2024 at 10:32:36PM +1000, Michael Ellerman wrote:
>> >> Niklas Cassel <cassel@...nel.org> writes:
>> >> > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
>> ...
>> >> >> ------------[ cut here ]------------
>> >> >> kernel BUG at drivers/ata/pata_macio.c:544!
>> >> >
>> >> > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544
>> >> >
>> >> > It seems that the
>> >> > while (sg_len) loop does not play nice with the new .max_segment_size.
>> >> 
>> >> Right, but only for 4KB kernels for some reason. Is there some limit
>> >> elsewhere that prevents the bug tripping on 64KB kernels, or is it just
>> >> luck that no one has hit it?
>> >
>> > Have your tried running fio (flexible I/O tester), with reads with a very
>> > large block sizes?
>> >
>> > I would be surprised if it isn't possible to trigger the same bug with
>> > 64K page size.
>> >
>> > max segment size = 64K
>> > MAX_DCMDS = 256
>> > 256 * 64K = 16 MiB
>> > What happens if you run fio with a 16 MiB blocksize?
>> >
>> > Something like:
>> > $ sudo fio --name=test --filename=/dev/sdX --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M
>> 
>> Nothing interesting happens, fio succeeds.
>> 
>> The largest request that comes into pata_macio_qc_prep() is 1280KB,
>> which results in 40 DMA list entries.
>> 
>> I tried with a larger block size but it doesn't change anything. I guess
>> there's some limit somewhere else in the stack?
>> 
>> That was testing on qemu, but I don't think it should matter?
>> 
>> I guess there's no way to run the fio test against a file, ie. without a
>> raw partition? My real G5 doesn't have any spare disks/partitions in it.
>
>
> You can definitely run fio against a file.
>
> e.g.
> $ dd if=/dev/random of=/tmp/my_file bs=1M count=1024
>
> $ sudo fio --name=test --filename=/tmp/my_file --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M
>
>
> Perhaps try with 32M block size, so that it is larger than
> max segment size = 64K
> MAX_DCMDS = 256
> 256 * 64K = 16 MiB
>
> Perhaps also try with and without --direct.
> It could be interesting to use the page cache if you do --rw=readwrite
> that might possibly result in larger bios.

Changing the fio settings didn't help.

I did some tracing and noticed it was always splitting the bio in
__bio_split_to_limits() based on get_max_io_size().

That eventually lead me to max_sectors_kb in sysfs, which is by default
(on my system at least) 1280 (KB) - which is exactly the size I see in
pata-macio.

Increasing max_sectors_kb with:

  # echo 16384 > /sys/devices/pci0000:f0/0000:f0:0c.0/0.80000000:mac-io/0.00020000:ata-3/ata1/host0/target0:0:0/0:0:0:0/block/sda/queue/max_sectors_kb

Allows me to trip the bug (I turned it into a WARN to keep the system alive):

  [ 1804.988552] ------------[ cut here ]------------
  [ 1804.988963] DMA table overflow!
  [ 1804.989781] WARNING: CPU: 0 PID: 299 at drivers/ata/pata_macio.c:546 pata_macio_qc_prep+0x27c/0x2a4
  [ 1804.991157] Modules linked in:
  [ 1804.991945] CPU: 0 PID: 299 Comm: iou-wrk-298 Not tainted 6.10.4-dirty #242
  [ 1804.992688] Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
  [ 1804.993512] NIP:  c0000000008bcfb4 LR: c0000000008bcfb0 CTR: 0000000000000000
  [ 1804.994244] REGS: c0000000052d6fb0 TRAP: 0700   Not tainted  (6.10.4-dirty)
  [ 1804.994998] MSR:  800000000202b032 <SF,VEC,EE,FP,ME,IR,DR,RI>  CR: 44484240  XER: 00000000
  [ 1804.996178] IRQMASK: 1
  [ 1804.996178] GPR00: c0000000008bcfb0 c0000000052d7250 c000000000f50b00 0000000000000013
  [ 1804.996178] GPR04: 0000000100000282 c0000000014806c0 fffffffffffec230 000000003ed10000
  [ 1804.996178] GPR08: 0000000000000027 c00000003fe02410 0000000000000001 0000000044484240
  [ 1804.996178] GPR12: c0000000014806a8 c0000000017b0000 c0000000006c9488 c000000005026b40
  [ 1804.996178] GPR16: 0000000000000000 0000000002000000 c000000000cecaa8 c000000000e44ac8
  [ 1804.996178] GPR20: 0000000000800000 0000000000000080 000000000000ff00 c000000000d12730
  [ 1804.996178] GPR24: c000000000e20788 c00000000330eae8 0000000000000000 0000000000000020
  [ 1804.996178] GPR28: c0000000036c8130 0000000000000100 0000000000000000 c000000003fb1000
  [ 1805.003085] NIP [c0000000008bcfb4] pata_macio_qc_prep+0x27c/0x2a4
  [ 1805.003715] LR [c0000000008bcfb0] pata_macio_qc_prep+0x278/0x2a4
  [ 1805.004564] Call Trace:
  [ 1805.004963] [c0000000052d7250] [c0000000008bcfb0] pata_macio_qc_prep+0x278/0x2a4 (unreliable)
  [ 1805.005974] [c0000000052d7310] [c00000000089840c] ata_qc_issue+0x170/0x390
  [ 1805.006719] [c0000000052d7390] [c0000000008a5160] __ata_scsi_queuecmd+0x220/0x7d4
  [ 1805.007472] [c0000000052d7410] [c000000000 8a5778] ata_scsi_queuecmd+0x64/0xe8
  [ 1805.008194] [c0000000052d7450] [c00000000085b450] scsi_queue_rq+0x408/0xd74
  [ 1805.008904] [c0000000052d7500] [c00000000067bfc8] blk_mq_dispatch_rq_list+0x160/0x914
  [ 1805.009696] [c0000000052d75b0] [c000000000683d50] __blk_mq_sched_dispatch_requests+0x5fc/0x77c
  [ 1805.010551] [c0000000052d7680] [c000000000683f68] blk_mq_sched_dispatch_requests+0x44/0x90
  [ 1805.011371] [c0000000052d76b0] [c000000000677328] blk_mq_run_hw_queue+0x220/0x240
  [ 1805.012138] [c0000000052d76f0] [c00000000067b084] blk_mq_flush_plug_list.part.0+0x214/0x75c
  [ 1805.012975] [c0000000052d77a0] [c00000000067b664] blk_add_rq_to_plug+0x98/0x1f0
  [ 1805.013717] [c0000000052d77e0] [c00000000067cd4c] blk_mq_submit_bio+0x5b0/0x888
  [ 1805.014457] [c0000000052d7890] [c000000000667bf0] __submit_bio+0xa4/0x2e4
  [ 1805.015149] [c0000000052d7910] [c0000000006680bc] submit_bio_noacct_nocheck+0x28c/0x404
  [ 1805.015952] [c0000000052d7980] [c00000000065bf68] blkdev_direct_IO+0x63c/0x824
  [ 1805.016688] [c0000000052d7aa0] [c00000000065c614] blkdev_read_iter+0x10c/0x1c8
  [ 1805.017423] [c0000000052d7af0] [c0000000006b2cdc] __io_read+0xe0/0x5a0
  [ 1805.018091] [c0000000052d7b50] [c0000000006b3a70] io_read+0x30/0x74
  [ 1805.018733] [c0000000052d7b80] [c0000000006a9040] io_issue_sqe+0x8c/0x768
  [ 1805.019419] [c0000000052d7c00] [c0000000006a9850] io_wq_submit_work+0x118/0x518
  [ 1805.020153] [c0000000052d7c60] [c0000000006c8ebc] io_worker_handle_work+0x23c/0x800
  [ 1805.020923] [c0000000052d7d00] [c0000000006c95f8] io_wq_worker+0x178/0x51c
  [ 1805.021621] [c0000000052d7e50] [c00000000000bd94] ret_from_kernel_user_thread+0x14/0x1c
  

Same behaviour on a kernel with PAGE_SIZE = 4KB.

I don't know why max_sectors_kb starts out with a different value on my
system, but anyway the bug is lurking there, even if it doesn't trip by
default in some configurations.

I'll clean up and send my patch from earlier in the thread.

cheers