[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <43632d9d-722c-b14f-336a-eac402ef9362@arm.com>
Date: Wed, 6 Sep 2023 20:52:40 +0100
From: Robin Murphy <robin.murphy@....com>
To: Antonio Terceiro <antonio.terceiro@...aro.org>
Cc: Takashi Iwai <tiwai@...e.de>, Jaroslav Kysela <perex@...ex.cz>,
Takashi Iwai <tiwai@...e.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, alsa-devel@...a-project.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: snd-cmipci oops during probe on arm64 (current mainline,
pre-6.6-rc1)
On 2023-09-06 19:36, Antonio Terceiro wrote:
> On Wed, Sep 06, 2023 at 01:49:16PM +0100, Robin Murphy wrote:
>> On 2023-09-06 07:10, Takashi Iwai wrote:
>>> On Wed, 06 Sep 2023 00:01:01 +0200,
>>> Antonio Terceiro wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm using an arm64 workstation, and wanted to add a sound card to it. I bought
>>>> one who was pretty popular around where I live, and it is supported by the
>>>> snd-cmipci driver.
>>>>
>>>> It's this one:
>>>>
>>>> 0005:02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
>>>>
>>>> After building a mailine kernel (post-v6.5, pre-rc1) on Debian testing arm64
>>>> with localmodconfig + CONFIG_SND_CMIPCI=m, it crashes with "Unable to handle
>>>> kernel paging request at virtual address fffffbfffe80000c", and the system
>>>> never finishes to boot. The login manager never shows up and the serial console
>>>> never gets to a login prompt. I observed the same issue on a 6.3 Debian kernel,
>>>> after rebuilding with CONFIG_SND_CMIPCI=m.
>>>>
>>>> If I stop the module from being automatically loaded by adding
>>>> `blacklist snd-cmipci` to /etc/modprobe.d/snd-cmipci.conf (or if I
>>>> remove the card from the PCIe slot), I get the system to boot. But tring
>>>> to load the module manually causes the same crash (I only tested this
>>>> with the card on):
>>>>
>>>> [ +4,501093] snd_cmipci 0005:02:00.0: stream 512 already in tree
>>>> [ +0,000155] Unable to handle kernel paging request at virtual address fffffbfffe80000c
>>>> [ +0,007927] Mem abort info:
>>>> [ +0,002793] ESR = 0x0000000096000006
>>>> [ +0,003743] EC = 0x25: DABT (current EL), IL = 32 bits
>>>> [ +0,005307] SET = 0, FnV = 0
>>>> [ +0,003049] EA = 0, S1PTW = 0
>>>> [ +0,003134] FSC = 0x06: level 2 translation fault
>>>> [ +0,004872] Data abort info:
>>>> [ +0,002873] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
>>>> [ +0,005479] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>> [ +0,005047] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>> [ +0,000003] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080519fe9000
>>>> [ +0,000004] [fffffbfffe80000c] pgd=000008051a979003, p4d=000008051a979003, pud=000008051a97a003, pmd=0000000000000000
>>>> [ +0,000009] Internal error: Oops: 0000000096000006 [#1] SMP
>>>> [ +0,028142] Modules linked in: snd_cmipci(+) snd_mpu401_uart snd_opl3_lib xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_seq_dummy snd_hrtimer snd_seq qrtr rfkill overlay ftdi_sio usbserial snd_usb_audio snd_usbmidi_lib snd_pcm aes_ce_blk aes_ce_cipher snd_hwdep polyval_ce snd_rawmidi polyval_generic snd_seq_device joydev snd_timer ghash_ce hid_generic gf128mul snd usbhid sha2_ce ipmi_ssif soundcore hid mc sha256_arm64 ipmi_devintf arm_spe_pmu ipmi_msghandler sha1_ce sbsa_gwdt binfmt_misc nls_ascii nls_cp437 vfat fat xgene_hwmon cppc_cpufreq arm_cmn arm_dsu_pmu evdev nfsd auth_rpcgss nfs_acl lockd grace dm_mod fuse loop efi_pstore dax sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid
>>> 456 async_raid6_recov async_memcpy
>>>> [ +0,000142] async_pq async_xor async_tx libcrc32c crc32c_generic xor xor_neon raid6_pq raid1 raid0 multipath linear md_mod nvme nvme_core ast t10_pi drm_shmem_helper xhci_pci drm_kms_helper xhci_hcd crc64_rocksoft crc64 drm crc_t10dif usbcore crct10dif_generic igb crct10dif_ce crct10dif_common usb_common i2c_algo_bit i2c_designware_platform i2c_designware_core
>>>> [ +0,121670] CPU: 0 PID: 442 Comm: kworker/0:4 Not tainted 6.5.0+ #2
>>>> [ +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
>>>> [ +0,012506] Workqueue: events work_for_cpu_fn
>>>> [ +0,004353] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>> [ +0,006953] pc : logic_inl+0xa0/0xd8
>>>> [ +0,003570] lr : snd_cmipci_probe+0x7a4/0x1140 [snd_cmipci]
>>>> [ +0,005578] sp : ffff80008287bc70
>>>> [ +0,003303] x29: ffff80008287bc70 x28: ffff08008af9d6a0 x27: 0000000000000000
>>>> [ +0,007128] x26: ffffc4818263c228 x25: 0000000000000000 x24: 0000000000000001
>>>> [ +0,007127] x23: ffff07ff81a9e000 x22: ffff07ff81a9e0c0 x21: ffff08008af9d080
>>>> [ +0,007127] x20: ffffc4818263c000 x19: 0000000000000000 x18: ffffffffffffffff
>>>> [ +0,007127] x17: 0000000000000000 x16: ffffc4819ac3cd38 x15: ffff80008287ba80
>>>> [ +0,007127] x14: 0000000000000001 x13: ffff80008287bbc4 x12: 0000000000000000
>>>> [ +0,007126] x11: ffff07ff834616d0 x10: ffffffffffffffc0 x9 : ffffc4819a61dd18
>>>> [ +0,007127] x8 : 0000000000000228 x7 : 0000000000000001 x6 : 00000000000000ff
>>>> [ +0,007127] x5 : ffffc4819adb7998 x4 : 0000000000000000 x3 : 00000000000000ff
>>>> [ +0,007127] x2 : 0000000000ffbffe x1 : 000000000000000c x0 : fffffbfffe80000c
>>>> [ +0,007126] Call trace:
>>>> [ +0,002436] logic_inl+0xa0/0xd8
>>>> [ +0,003221] local_pci_probe+0x48/0xb8
>>>> [ +0,003744] work_for_cpu_fn+0x24/0x40
>>>> [ +0,003741] process_one_work+0x170/0x3a8
>>>> [ +0,004002] worker_thread+0x23c/0x460
>>>> [ +0,003742] kthread+0xe8/0xf8
>>>> [ +0,003047] ret_from_fork+0x10/0x20
>>>> [ +0,003569] Code: d2bfd000 f2df7fe0 f2ffffe0 8b000020 (b9400000)
>>>> [ +0,006083] ---[ end trace 0000000000000000 ]---
>>>>
>>>> Because this sound card chipset seems to be popular (pretty much all PCI cards
>>>> I can find to buy locally use that), I'm thinking this might be specific to
>>>> arm64, otherwise someone would have seen this before.
>>>
>>> There is only one change in this driver code itself since 6.5 (commit
>>> b6ba0aa46138), and judging from the stack trace, it's unrelated with
>>> your problem. It's more likely a regression in the lower level code,
>>> e.g. PCI layer or arch/arm64 stuff.
>>>
>>> Could you try git bisect?
>>
>> Hmm, but has this combination of card and machine *ever* actually worked?
>
> That could be it. In trying to find a starting point for the bisection,
> I tried 6.1.0, 5.15.130, and 5.10.19, and they all fail in exactly the
> same way. I didn't go further back.
>
>> It's blowing up trying to access PCI I/O space, which has apparently ended
>> up in the indirect access mechanism without that being configured correctly.
>> That is definitely an issue down somewhere between the PCI layer and the
>> system firmware. Does the system even have an I/O space window? Some arm64
>> machines don't. I guess we might not have got as far as probing a driver if
>> the I/O BAR couldn't be assigned at all, but either way something's not gone
>> right.
>
> I'm pretty sure I saw reports of people using PCI GPUs on this machine,
> but I would need to confirm.
GPUs and any other PCIe devices will be fine, since they will use memory
BARs - I/O space is pretty much deprecated in PCIe, and as mentioned
some systems don't even support it at all. I found a datasheet for
CMI8738, and they seem to be right at the other end of the scale as
legacy PCI chips with *only* an I/O BAR (and so I guess your card
includes a PCIe-PCI bridge as well), so are definitely going to be
hitting paths that are less well-exercised on arm64 in general.
> What info would I need to gather from the machine in order to figure
> this out?
The first thing I'd try is rebuilding the kernel with
CONFIG_INDIRECT_PIO disabled and see what difference that makes. I'm not
too familiar with that area of the code, so the finer details of how to
debug broken I/O space beyond that would be more of a linux-pci question.
Thanks,
Robin.
Powered by blists - more mailing lists