lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <43632d9d-722c-b14f-336a-eac402ef9362@arm.com>
Date:   Wed, 6 Sep 2023 20:52:40 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Antonio Terceiro <antonio.terceiro@...aro.org>
Cc:     Takashi Iwai <tiwai@...e.de>, Jaroslav Kysela <perex@...ex.cz>,
        Takashi Iwai <tiwai@...e.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>, alsa-devel@...a-project.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: snd-cmipci oops during probe on arm64 (current mainline,
 pre-6.6-rc1)

On 2023-09-06 19:36, Antonio Terceiro wrote:
> On Wed, Sep 06, 2023 at 01:49:16PM +0100, Robin Murphy wrote:
>> On 2023-09-06 07:10, Takashi Iwai wrote:
>>> On Wed, 06 Sep 2023 00:01:01 +0200,
>>> Antonio Terceiro wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm using an arm64 workstation, and wanted to add a sound card to it. I bought
>>>> one who was pretty popular around where I live, and it is supported by the
>>>> snd-cmipci driver.
>>>>
>>>> It's this one:
>>>>
>>>> 0005:02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
>>>>
>>>> After building a mailine kernel (post-v6.5, pre-rc1) on Debian testing arm64
>>>> with localmodconfig + CONFIG_SND_CMIPCI=m, it crashes with "Unable to handle
>>>> kernel paging request at virtual address fffffbfffe80000c", and the system
>>>> never finishes to boot. The login manager never shows up and the serial console
>>>> never gets to a login prompt. I observed the same issue on a 6.3 Debian kernel,
>>>> after rebuilding with CONFIG_SND_CMIPCI=m.
>>>>
>>>> If I stop the module from being automatically loaded by adding
>>>> `blacklist snd-cmipci` to /etc/modprobe.d/snd-cmipci.conf (or if I
>>>> remove the card from the PCIe slot), I get the system to boot. But tring
>>>> to load the module manually causes the same crash (I only tested this
>>>> with the card on):
>>>>
>>>> [  +4,501093] snd_cmipci 0005:02:00.0: stream 512 already in tree
>>>> [  +0,000155] Unable to handle kernel paging request at virtual address fffffbfffe80000c
>>>> [  +0,007927] Mem abort info:
>>>> [  +0,002793]   ESR = 0x0000000096000006
>>>> [  +0,003743]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>> [  +0,005307]   SET = 0, FnV = 0
>>>> [  +0,003049]   EA = 0, S1PTW = 0
>>>> [  +0,003134]   FSC = 0x06: level 2 translation fault
>>>> [  +0,004872] Data abort info:
>>>> [  +0,002873]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
>>>> [  +0,005479]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>> [  +0,005047]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>> [  +0,000003] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080519fe9000
>>>> [  +0,000004] [fffffbfffe80000c] pgd=000008051a979003, p4d=000008051a979003, pud=000008051a97a003, pmd=0000000000000000
>>>> [  +0,000009] Internal error: Oops: 0000000096000006 [#1] SMP
>>>> [  +0,028142] Modules linked in: snd_cmipci(+) snd_mpu401_uart snd_opl3_lib xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_seq_dummy snd_hrtimer snd_seq qrtr rfkill overlay ftdi_sio usbserial snd_usb_audio snd_usbmidi_lib snd_pcm aes_ce_blk aes_ce_cipher snd_hwdep polyval_ce snd_rawmidi polyval_generic snd_seq_device joydev snd_timer ghash_ce hid_generic gf128mul snd usbhid sha2_ce ipmi_ssif soundcore hid mc sha256_arm64 ipmi_devintf arm_spe_pmu ipmi_msghandler sha1_ce sbsa_gwdt binfmt_misc nls_ascii nls_cp437 vfat fat xgene_hwmon cppc_cpufreq arm_cmn arm_dsu_pmu evdev nfsd auth_rpcgss nfs_acl lockd grace dm_mod fuse loop efi_pstore dax sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid
>>>    456 async_raid6_recov async_memcpy
>>>> [  +0,000142]  async_pq async_xor async_tx libcrc32c crc32c_generic xor xor_neon raid6_pq raid1 raid0 multipath linear md_mod nvme nvme_core ast t10_pi drm_shmem_helper xhci_pci drm_kms_helper xhci_hcd crc64_rocksoft crc64 drm crc_t10dif usbcore crct10dif_generic igb crct10dif_ce crct10dif_common usb_common i2c_algo_bit i2c_designware_platform i2c_designware_core
>>>> [  +0,121670] CPU: 0 PID: 442 Comm: kworker/0:4 Not tainted 6.5.0+ #2
>>>> [  +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
>>>> [  +0,012506] Workqueue: events work_for_cpu_fn
>>>> [  +0,004353] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>> [  +0,006953] pc : logic_inl+0xa0/0xd8
>>>> [  +0,003570] lr : snd_cmipci_probe+0x7a4/0x1140 [snd_cmipci]
>>>> [  +0,005578] sp : ffff80008287bc70
>>>> [  +0,003303] x29: ffff80008287bc70 x28: ffff08008af9d6a0 x27: 0000000000000000
>>>> [  +0,007128] x26: ffffc4818263c228 x25: 0000000000000000 x24: 0000000000000001
>>>> [  +0,007127] x23: ffff07ff81a9e000 x22: ffff07ff81a9e0c0 x21: ffff08008af9d080
>>>> [  +0,007127] x20: ffffc4818263c000 x19: 0000000000000000 x18: ffffffffffffffff
>>>> [  +0,007127] x17: 0000000000000000 x16: ffffc4819ac3cd38 x15: ffff80008287ba80
>>>> [  +0,007127] x14: 0000000000000001 x13: ffff80008287bbc4 x12: 0000000000000000
>>>> [  +0,007126] x11: ffff07ff834616d0 x10: ffffffffffffffc0 x9 : ffffc4819a61dd18
>>>> [  +0,007127] x8 : 0000000000000228 x7 : 0000000000000001 x6 : 00000000000000ff
>>>> [  +0,007127] x5 : ffffc4819adb7998 x4 : 0000000000000000 x3 : 00000000000000ff
>>>> [  +0,007127] x2 : 0000000000ffbffe x1 : 000000000000000c x0 : fffffbfffe80000c
>>>> [  +0,007126] Call trace:
>>>> [  +0,002436]  logic_inl+0xa0/0xd8
>>>> [  +0,003221]  local_pci_probe+0x48/0xb8
>>>> [  +0,003744]  work_for_cpu_fn+0x24/0x40
>>>> [  +0,003741]  process_one_work+0x170/0x3a8
>>>> [  +0,004002]  worker_thread+0x23c/0x460
>>>> [  +0,003742]  kthread+0xe8/0xf8
>>>> [  +0,003047]  ret_from_fork+0x10/0x20
>>>> [  +0,003569] Code: d2bfd000 f2df7fe0 f2ffffe0 8b000020 (b9400000)
>>>> [  +0,006083] ---[ end trace 0000000000000000 ]---
>>>>
>>>> Because this sound card chipset seems to be popular (pretty much all PCI cards
>>>> I can find to buy locally use that), I'm thinking this might be specific to
>>>> arm64, otherwise someone would have seen this before.
>>>
>>> There is only one change in this driver code itself since 6.5 (commit
>>> b6ba0aa46138), and judging from the stack trace, it's unrelated with
>>> your problem.   It's more likely a regression in the lower level code,
>>> e.g. PCI layer or arch/arm64 stuff.
>>>
>>> Could you try git bisect?
>>
>> Hmm, but has this combination of card and machine *ever* actually worked?
> 
> That could be it. In trying to find a starting point for the bisection,
> I tried 6.1.0, 5.15.130, and 5.10.19, and they all fail in exactly the
> same way. I didn't go further back.
> 
>> It's blowing up trying to access PCI I/O space, which has apparently ended
>> up in the indirect access mechanism without that being configured correctly.
>> That is definitely an issue down somewhere between the PCI layer and the
>> system firmware. Does the system even have an I/O space window? Some arm64
>> machines don't. I guess we might not have got as far as probing a driver if
>> the I/O BAR couldn't be assigned at all, but either way something's not gone
>> right.
> 
> I'm pretty sure I saw reports of people using PCI GPUs on this machine,
> but I would need to confirm.

GPUs and any other PCIe devices will be fine, since they will use memory 
BARs - I/O space is pretty much deprecated in PCIe, and as mentioned 
some systems don't even support it at all. I found a datasheet for 
CMI8738, and they seem to be right at the other end of the scale as 
legacy PCI chips with *only* an I/O BAR (and so I guess your card 
includes a PCIe-PCI bridge as well), so are definitely going to be 
hitting paths that are less well-exercised on arm64 in general.

> What info would I need to gather from the machine in order to figure
> this out?

The first thing I'd try is rebuilding the kernel with 
CONFIG_INDIRECT_PIO disabled and see what difference that makes. I'm not 
too familiar with that area of the code, so the finer details of how to 
debug broken I/O space beyond that would be more of a linux-pci question.

Thanks,
Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ