linux-kernel - mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <87jzgur32p.fsf@ni.com>
Date: Mon, 05 Aug 2024 16:33:57 -0500
From: Gratian Crisan <gratian.crisan@...com>
To: Adrian Hunter <adrian.hunter@...el.com>
Cc: Ulf Hansson <ulf.hansson@...aro.org>, Hans de Goede
 <hdegoede@...hat.com>, linux-mmc@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: mmc0: Got data interrupt 0x04000000 even though no data operation
 was in progress.

Hi all,

We are getting the following splat on latest 6.11.0-rc2-00002-gc813111d19e6 (and
older) kernel(s):

[    4.792991] mmc0: new ultra high speed DDR50 SDHC card at address 0001
[    4.793550]   with environment:
[    4.793786]     HOME=/
[    4.793985]     TERM=linux
[    4.794201]     BOOT_IMAGE=/runmode/bzImage
[    4.794485]     sys_reset=false
[    4.795791] mmcblk0: mmc0:0001 0016G 15.2 GiB
[    5.333153] mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
[    5.333676] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[    5.334069] mmc0: sdhci: Sys addr:  0x12454200 | Version:  0x0000b502
[    5.334464] mmc0: sdhci: Blk size:  0x00007040 | Blk cnt:  0x00000001
[    5.334860] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000010
[    5.335253] mmc0: sdhci: Present:   0x01ff0000 | Host ctl: 0x00000016
[    5.335648] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[    5.336040] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000107
[    5.336432] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000000
[    5.336824] mmc0: sdhci: Int enab:  0x03ff008b | Sig enab: 0x03ff008b
[    5.337214] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[    5.337605] mmc0: sdhci: Caps:      0x076864b2 | Caps_1:   0x00000004
[    5.337997] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[    5.338389] mmc0: sdhci: Resp[0]:   0x00400900 | Resp[1]:  0x00000000
[    5.338780] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[    5.339170] mmc0: sdhci: Host ctl2: 0x0000000c
[    5.339468] mmc0: sdhci: ADMA Err:  0x00000003 | ADMA Ptr: 0x12454200
[    5.339859] mmc0: sdhci: ============================================
[    5.340293] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    5.344663] Buffer I/O error on dev mmcblk0, logical block 0, async page read
[    5.346127]  mmcblk0: p1 p2

This is on an Intel Bay Trail based system: NI cRIO-9053 using an Atom E3805.

The issue appears related to the one fixed by commit b3855668d98c ("mmc: sdhci:
Add support for "Tuning Error" interrupts") and discussed here[1].

After adding some debug prints it appears that in our case we get a tuning error
interrupt during a MMC_SEND_STATUS (13) sdhci cmd which has no 'host->data'
associated with it (leading to the splat):

[    4.893298] mmc0: new ultra high speed DDR50 SDHC card at address 0001
[    4.896489] mmcblk0: mmc0:0001 0016G 15.2 GiB
[    4.906048] mmc0: tuning err irq, sdhci cmd: 18, host->cmd: 0000000003b39249, host->data: 00000000c0b4ad8a
[    4.963027] mmc0: tuning err irq, sdhci cmd: 18, host->cmd: 0000000003b39249, host->data: 00000000c0b4ad8a
[    5.384960] mmc0: tuning err irq, sdhci cmd: 17, host->cmd: 0000000003b39249, host->data: 00000000c0b4ad8a
[    5.442877] mmc0: tuning err irq, sdhci cmd: 13, host->cmd: 00000000e1669bad, host->data: 0000000000000000
[    5.443463] mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.

I am new to this area of the kernel so I would appreciate any suggestions on the
direction to take here:

  - Should the tuning error interrupts be handled in common code in sdhci_irq()
    (or at least before the !host->data check in sdhci_data_irq())?

  - Is this more of an issue with tuning not happening when is expected or
    taking too long, since at first we do get the error during data transfer
    commands? Suggestions on what I should debug/trace next appreciated.

Thanks,
    Gratian

[1] https://lore.kernel.org/r/20240410191639.526324-3-hdegoede@redhat.com