linux-kernel - Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <yq18sm6n9hp.fsf@oracle.com>
Date:   Thu, 16 Jan 2020 20:43:14 -0500
From:   "Martin K. Petersen" <martin.petersen@...cle.com>
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     "Martin K. Petersen" <martin.petersen@...cle.com>,
        linux-hwmon@...r.kernel.org, Jean Delvare <jdelvare@...e.com>,
        Linus Walleij <linus.walleij@...aro.org>,
        Bart Van Assche <bvanassche@....org>,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-scsi@...r.kernel.org, linux-ide@...r.kernel.org,
        Chris Healy <cphealy@...il.com>
Subject: Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives


Guenter,

> Can you by any chance provide a full traceback ?

My test machines are tied up with something else right now. This is from
a few days ago (pristine hwmon-next, I believe):

[ 1055.611912] ------------[ cut here ]------------
[ 1055.611922] WARNING: CPU: 3 PID: 3233 at drivers/base/dd.c:519 really_probe+0x436/0x4f0
[ 1055.611925] Modules linked in: sd_mod sg ahci libahci libata drivetemp scsi_mod crc32c_intel igb i2c_algo_bit i2c_core dca hwmon ipv6 nf_defrag_ipv6 crc_ccitt
[ 1055.611955] CPU: 3 PID: 3233 Comm: kworker/u17:1 Tainted: G        W         5.5.0-rc1+ #21
[ 1055.611965] Workqueue: events_unbound async_run_entry_fn
[ 1055.611973] RIP: 0010:really_probe+0x436/0x4f0
[ 1055.611979] Code: c7 30 69 f8 82 e8 ba 94 e5 ff e9 60 ff ff ff 48 8d 7b 38 e8 cc d9 b4 ff 48 8b 43 38 48 85 c0 0f 85 41 fd ff ff e9 4f fd ff ff <0f> 0b e9 66 fc ff ff 48 8d 7d 50 e8 aa d9 b4 ff 4c 8b 6d 50 4d 85
[ 1055.611983] RSP: 0018:ffff8881edb77c98 EFLAGS: 00010287
[ 1055.611989] RAX: ffff8881e1f8fb80 RBX: ffffffffa033a000 RCX: ffffffff8182e583
[ 1055.611993] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881dec506a8
[ 1055.611997] RBP: ffff8881dec50238 R08: 0000000000000001 R09: fffffbfff09629ed
[ 1055.612000] R10: fffffbfff09629ec R11: 0000000000000003 R12: 0000000000000000
[ 1055.612004] R13: ffff8881dec506a8 R14: ffffffff8182eca0 R15: 000000000000000b
[ 1055.612009] FS:  0000000000000000(0000) GS:ffff8881f8900000(0000) knlGS:0000000000000000
[ 1055.612013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1055.612017] CR2: 00007f957884a000 CR3: 00000001df5ec003 CR4: 00000000000606e0
[ 1055.612020] Call Trace:
[ 1055.612038]  ? driver_probe_device+0x170/0x170
[ 1055.612045]  driver_probe_device+0x82/0x170
[ 1055.612058]  ? driver_probe_device+0x170/0x170
[ 1055.612064]  __driver_attach_async_helper+0xa3/0xe0
[ 1055.612076]  async_run_entry_fn+0x68/0x2a0
[ 1055.612094]  process_one_work+0x4df/0x990
[ 1055.612121]  ? pwq_dec_nr_in_flight+0x110/0x110
[ 1055.612127]  ? do_raw_spin_lock+0x113/0x1d0
[ 1055.612161]  worker_thread+0x78/0x5c0
[ 1055.612190]  ? process_one_work+0x990/0x990
[ 1055.612195]  kthread+0x1be/0x1e0
[ 1055.612202]  ? kthread_create_worker_on_cpu+0xd0/0xd0
[ 1055.612215]  ret_from_fork+0x3a/0x50
[ 1055.612251] irq event stamp: 3512
[ 1055.612259] hardirqs last  enabled at (3511): [<ffffffff81d2b874>] _raw_spin_unlock_irq+0x24/0x30
[ 1055.612265] hardirqs last disabled at (3512): [<ffffffff810029c9>] trace_hardirqs_off_thunk+0x1a/0x1c
[ 1055.612272] softirqs last  enabled at (3500): [<ffffffff820003a5>] __do_softirq+0x3a5/0x5a8
[ 1055.612281] softirqs last disabled at (3489): [<ffffffff810cec7b>] irq_exit+0xfb/0x100
[ 1055.612284] ---[ end trace f0a8dd9a37bea031 ]---

> Either case, I would like to track down how the warning happens, so any
> information you can provide that lets me reproduce the problem would be
> very helpful.

The three systems that exhibit the problem are stock (2010/2012/2014
vintage) x86_64 servers with onboard AHCI and a variety of 4-6 SATA
drives each.

For the qemu test I didn't have ahci configured but I had my SCSI temp
patch on top of yours and ran modprobe drivetemp; modprobe scsi_debug to
trigger the warnings.

-- 
Martin K. Petersen	Oracle Linux Engineering