lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAEjTV=--CvwHPhwiD0ctTDnupiO9-5ssi6up_uCXUeYnqr264Q@mail.gmail.com>
Date:   Wed, 4 Jan 2023 00:12:55 -0700
From:   Jesse <pianohacker@...il.com>
To:     netdev@...r.kernel.org
Subject: Fwd: Bad page after suspend with Innodisk EGPL-T101 [1d6a:14c0]

(Resending to list due to HTML in previous message)

After resume, I sometimes see the following error and the device hangs:

[36257.935269] BUG: Bad page state in process kworker/u64:33  pfn:10e400
[36257.935269] page:00000000597be4f0 refcount:0 mapcount:0
mapping:00000000eeb38d16 index:0x0 pfn:0x10e400
[36257.935270] aops:anon_aops.1 ino:63a9
[36257.935271] flags: 0x17ffffc0000800(arch_1|node=0|zone=2|lastcpupid=0x1fffff)
[36257.935271] raw: 0017ffffc0000800 0000000000000000 dead000000000122
ffff970d81f08178
[36257.935272] raw: 0000000000000000 0000000000000003 00000000ffffffff
0000000000000000
[36257.935272] page dumped because: non-NULL mapping
[36257.935272] Modules linked in: i2c_dev xt_conntrack nft_chain_nat
xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables
br_netfilter bridge stp llc wireguard libchacha20poly1305
chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic
libchacha ip6_udp_tunnel udp_tunnel ctr ccm snd_seq_dummy snd_hrtimer
snd_seq nfnetlink tun rfcomm cmac algif_hash algif_skcipher af_alg
qrtr overlay bnep binfmt_misc nls_ascii nls_cp437 vfat fat ext4
squashfs mbcache jbd2 loop btusb intel_rapl_msr intel_rapl_common
iwlmvm btrtl btbcm btintel btmtk snd_hda_codec_realtek edac_mce_amd
bluetooth mac80211 snd_hda_codec_generic uvcvideo snd_hda_codec_hdmi
videobuf2_vmalloc snd_hda_intel kvm_amd videobuf2_memops snd_usb_audio
snd_intel_dspcfg videobuf2_v4l2 eeepc_wmi snd_intel_sdw_acpi
jitterentropy_rng libarc4 asus_wmi videobuf2_common asus_ec_sensors
snd_hda_codec drbg snd_usbmidi_lib platform_profile kvm iwlwifi
[36257.935286]  videodev ansi_cprng battery snd_rawmidi snd_hda_core
irqbypass sparse_keymap snd_seq_device ecdh_generic rapl ledtrig_audio
wmi_bmof pcspkr mc snd_hwdep zenpower(OE) ecc cfg80211 crc16 joydev
snd_pcm razermouse(OE) snd_timer cdc_acm snd ccp sp5100_tco soundcore
rfkill rng_core watchdog acpi_cpufreq evdev nfsd auth_rpcgss nfs_acl
lockd lm92 grace nct6775 nct6775_core hwmon_vid sunrpc msr drivetemp
parport_pc ppdev lp parport fuse efi_pstore configfs efivarfs
ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq
zstd_compress libcrc32c crc32c_generic dm_crypt dm_mod
hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid amdgpu
gpu_sched drm_buddy video drm_display_helper cec crc32_pclmul
crc32c_intel rc_core ghash_clmulni_intel ahci drm_ttm_helper
sha512_ssse3 ttm libahci sha512_generic xhci_pci drm_kms_helper nvme
libata xhci_hcd nvme_core atlantic aesni_intel drm t10_pi crypto_simd
igb scsi_mod usbcore crc64_rocksoft_generic cryptd macsec dca
crc64_rocksoft
[36257.935303]  i2c_piix4 crc_t10dif ptp crct10dif_generic
i2c_algo_bit crct10dif_pclmul scsi_common usb_common crc64
crct10dif_common pps_core wmi button
[36257.935305] CPU: 8 PID: 610626 Comm: kworker/u64:33 Tainted: G    B
     OE      6.1.0-0-amd64 #1  Debian 6.1.1-1~exp2
[36257.935306] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4408 10/28/2022
[36257.935306] Workqueue: events_unbound async_run_entry_fn
[36257.935307] Call Trace:
[36257.935307]  <TASK>
[36257.935307]  dump_stack_lvl+0x44/0x5c
[36257.935308]  bad_page.cold+0x63/0x8f
[36257.935309]  __free_pages_ok+0x139/0x4f0
[36257.935310]  ? force_dma_unencrypted+0x27/0xa0
[36257.935311]  aq_ring_alloc+0xa4/0xb0 [atlantic]
[36257.935315]  aq_vec_ring_alloc+0xea/0x1a0 [atlantic]
[36257.935320]  aq_nic_init+0x114/0x1d0 [atlantic]
[36257.935324]  atl_resume_common+0x40/0xd0 [atlantic]
[36257.935328]  ? pci_legacy_resume+0x80/0x80
[36257.935329]  dpm_run_callback+0x4a/0x150
[36257.935330]  device_resume+0x88/0x190
[36257.935331]  async_resume+0x19/0x30
[36257.935331]  async_run_entry_fn+0x30/0x130
[36257.935332]  process_one_work+0x1c7/0x380
[36257.935333]  worker_thread+0x4d/0x380
[36257.935335]  ? rescuer_thread+0x3a0/0x3a0
[36257.935336]  kthread+0xe9/0x110
[36257.935336]  ? kthread_complete_and_exit+0x20/0x20
[36257.935337]  ret_from_fork+0x22/0x30
[36257.935339]  </TASK>
[36257.935445] atlantic 0000:01:00.0: PM: dpm_run_callback():
pci_pm_resume+0x0/0xe0 returns -12
[36257.935447] atlantic 0000:01:00.0: PM: failed to resume async: error -12

This error occurs inconsistently; sometimes after a single sleep/wake
cycle, sometimes after multiple. I have tried all of the random kernel
flags I can find from the most reputable stackexchange posts,
including pci=nommconf.

Note that this is with iommu=pt. Without this flag there are iommu
errors before a crash with a similar traceback.

On kernel 6.1.1 (not latest, but don't see relevant changes in Git
since). Apologies if this is the wrong path for reporting bugs.

-- 
Jesse Weaver

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ