lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM1AHpSKFk9ZosQf=k-Rm2=EFqco7y4Lpfb7m07r=j_uJd4T0A@mail.gmail.com>
Date:   Sat, 22 Feb 2020 18:55:53 +0100
From:   Martin Volf <martin.volf.42@...il.com>
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     Mika Westerberg <mika.westerberg@...ux.intel.com>,
        Jean Delvare <jdelvare@...e.com>,
        Wolfram Sang <wsa@...-dreams.de>, linux-i2c@...r.kernel.org,
        linux-hwmon@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

Hello,

On Sat, Feb 22, 2020 at 4:41 PM Guenter Roeck <linux@...ck-us.net> wrote:
> On 2/22/20 3:13 AM, Martin Volf wrote:
> > hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
> > motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
> > kernels, the driver nct6775 does not load.
> >
> > It is working OK in version 5.3. I have used almost all released stable
> > versions from 5.3.8 to 5.3.16; I didn't try older kernels.
...
> My wild guess would be that the i801 driver is a bit aggressive with
> reserving memory spaces, but I don't immediately see what it does
> differently in that regard after the offending patch. Does it work
> if you unload the i2c_i801 driver first ?

Yes, after unloading i2c_i801, the nct6775 works. There is definitely
some sort of race between these two drivers. Mostly i2c_i801 wins, but it
happened twice that nct6775 got automatically loaded just before i2c_i801
and the sensors worked.

> You could also try to compare the output of /proc/ioports with
> the old and the new kernel, and see if the IO address space used
> by nct6775 in v5.3 is assigned to the i801 driver (or some other
> driver, such as the watchdog driver) in v5.4.

This is diff of /proc/ioports in 5.3.18 with loaded nct6775 and in
5.4.21 without:

--- ioports-5.3.18
+++ ioports-5.4.21
@@ -2,6 +2,7 @@
   0000-001f : dma1
   0020-0021 : pic1
   002e-0031 : iTCO_wdt
+    002e-0031 : iTCO_wdt
   0040-0043 : timer0
   0050-0053 : timer1
   0060-0060 : keyboard
@@ -14,11 +15,10 @@
   00f0-00ff : fpu
     00f0-00f0 : PNP0C04:00
   0290-029f : pnp 00:01
-    0295-0296 : nct6775
-      0295-0296 : nct6775
   03c0-03df : vga+
   03f8-03ff : serial
   0400-041f : iTCO_wdt
+    0400-041f : iTCO_wdt
   0680-069f : pnp 00:03
 0cf8-0cff : PCI conf1
 0d00-ffff : PCI Bus 0000:00

> If you are into hacking the kernel, you could also add some
> debug messages into the nct6775 driver to find out where exactly
> it fails. If that helps, maybe we can then add those messages into
> into the driver source to help others if this is observed again.

I have added some pr_info calls, the diff is at the and of this massage.

"bad" dmesg (i.e. i2c_i801 loaded before modprobe nct6775)

[ 1631.975392] nct6775: ### sensors_nct6775_init:
platform_driver_register() -> 0x0
[ 1631.975396] nct6775: ### nct6775_find: superio_enter(0x2e) -> 0xfffffff0
[ 1631.975417] nct6775: ### nct6775_find: superio_enter(0x4e) -> 0x0
[ 1631.975455] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xffff

"good" dmesg (rmmod i2c_i801; modprobe nct6775)

[ 1730.751188] nct6775: ### sensors_nct6775_init:
platform_driver_register() -> 0x0
[ 1730.751213] nct6775: ### nct6775_find: superio_enter(0x2e) -> 0x0
[ 1730.751251] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xd42b
[ 1730.751359] nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
[ 1730.751367] ACPI Warning: SystemIO range
0x0000000000000295-0x0000000000000296 conflicts with OpRegion
0x0000000000000290-0x0000000000000299 (\AMW0.SHWM)
(20190816/utaddress-204)
[ 1730.751379] ACPI: This conflict may cause random problems and
system instability
[ 1730.751381] ACPI: If an ACPI driver is available for this device,
you should use it instead of the native driver
[ 1730.751431] nct6775: ### nct6775_probe: platform_get_resource() -> 0xfdac7b00
[ 1730.751434] nct6775: ### nct6775_probe: devm_request_region() -> 0
[ 1730.751554] nct6775 nct6775.656: Invalid temperature source 28 at
index 0, source register 0x100, temp register 0x73
[ 1730.751588] nct6775 nct6775.656: Invalid temperature source 28 at
index 1, source register 0x200, temp register 0x75
[ 1730.751686] nct6775 nct6775.656: Invalid temperature source 28 at
index 4, source register 0x900, temp register 0x7b
[ 1730.751865] nct6775: ### nct6775_probe: superio_enter(0x2e) -> 0x0
[ 1730.753685] nct6775: ### nct6775_find: superio_enter(0x4e) -> 0x0
[ 1730.753722] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xffff

So 0x2e is the resource the two drivers are fighting for.

I have created /etc/modprobe.d/nct6775-before-i2c_i801.conf with
install i2c_i801 /sbin/modprobe nct6775; /sbin/modprobe
--ignore-install i2c_i801

and it is working. I'm OK with this workaround, but I can do more
experiments if you instruct me what to try.

Thanks,

Martin

--8<--
--- nct6775.c.orig
+++ nct6775.c
@@ -3806,10 +3806,13 @@ static int nct6775_probe(struct platform
        int num_attr_groups = 0;

        res = platform_get_resource(pdev, IORESOURCE_IO, 0);
+pr_info("### nct6775_probe: platform_get_resource() -> 0x%x\n", res);
        if (!devm_request_region(&pdev->dev, res->start, IOREGION_LENGTH,
                                 DRVNAME))
                return -EBUSY;

+pr_info("### nct6775_probe: devm_request_region() -> 0");
+
        data = devm_kzalloc(&pdev->dev, sizeof(struct nct6775_data),
                            GFP_KERNEL);
        if (!data)
@@ -4318,6 +4321,7 @@ static int nct6775_probe(struct platform

                break;
        default:
+pr_info("### nct6775_probe: data->kind == 0x%x\n", data->kind);
                return -ENODEV;
        }
        data->have_in = BIT(data->in_num) - 1;
@@ -4503,6 +4507,7 @@ static int nct6775_probe(struct platform
        nct6775_init_device(data);

        err = superio_enter(sio_data->sioreg);
+pr_info("### nct6775_probe: superio_enter(0x%x) -> 0x%x\n",
sio_data->sioreg, err);
        if (err)
                return err;

@@ -4729,6 +4734,7 @@ static int __init nct6775_find(int sioad
        int addr;

        err = superio_enter(sioaddr);
+pr_info("### nct6775_find: superio_enter(0x%x) -> 0x%x\n", sioaddr, err);
        if (err)
                return err;

@@ -4737,6 +4743,7 @@ static int __init nct6775_find(int sioad
        if (force_id && val != 0xffff)
                val = force_id;

+pr_info("### nct6775_find: (val & SIO_ID_MASK) == 0x%04x\n", val);
        switch (val & SIO_ID_MASK) {
        case SIO_NCT6106_ID:
                sio_data->kind = nct6106;
@@ -4831,6 +4838,7 @@ static int __init sensors_nct6775_init(v
        int sioaddr[2] = { 0x2e, 0x4e };

        err = platform_driver_register(&nct6775_driver);
+pr_info("### sensors_nct6775_init: platform_driver_register() -> 0x%x\n", err);
        if (err)
                return err;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ