[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YZT3xvN9x9KDqoN7@smile.fi.intel.com>
Date: Wed, 17 Nov 2021 14:38:30 +0200
From: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
To: Hans de Goede <hdegoede@...hat.com>, sakari.ailus@...ux.intel.com
Cc: Daniel Scally <djrscally@...il.com>,
kernel test robot <oliver.sang@...el.com>, lkp@...ts.01.org,
lkp@...el.com, linux-kernel@...r.kernel.org,
gregkh@...uxfoundation.org, rafael@...nel.org
Subject: Re: [device property] 995fe757ec:
BUG:kernel_NULL_pointer_dereference,address
Just realized we are discussing this w/o Sakari involved.
On Wed, Nov 17, 2021 at 12:54:51PM +0100, Hans de Goede wrote:
> On 11/17/21 01:10, Daniel Scally wrote:
> > On 16/11/2021 16:59, Andy Shevchenko wrote:
> >> On Tue, Nov 16, 2021 at 03:55:00PM +0100, Hans de Goede wrote:
> >>> On 11/16/21 08:41, kernel test robot wrote:
> >>>> FYI, we noticed the following commit (built with gcc-9):
> >>>>
> >>>> commit: 995fe757ecaeac44e023458af64d27655f9dbf73 ("[PATCH] device property: Check fwnode->secondary when finding properties")
> >>>> url: https://github.com/0day-ci/linux/commits/Daniel-Scally/device-property-Check-fwnode-secondary-when-finding-properties/20211114-044259
> >>>> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/driver-core.git b5013d084e03e82ceeab4db8ae8ceeaebe76b0eb
> >>>> patch link: https://lore.kernel.org/lkml/20211113204141.520924-1-djrscally@gmail.com
> >>>>
> >>>> in testcase: boot
> >>>>
> >>>> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
> >>>>
> >>>> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >>>>
> >>>>
> >>>> +---------------------------------------------+------------+------------+
> >>>> | | b5013d084e | 995fe757ec |
> >>>> +---------------------------------------------+------------+------------+
> >>>> | boot_successes | 23 | 0 |
> >>>> | boot_failures | 0 | 22 |
> >>>> | BUG:kernel_NULL_pointer_dereference,address | 0 | 22 |
> >>>> | Oops:#[##] | 0 | 22 |
> >>>> | EIP:fwnode_property_get_reference_args | 0 | 22 |
> >>>> | Kernel_panic-not_syncing:Fatal_exception | 0 | 22 |
> >>>> +---------------------------------------------+------------+------------+
> >>>>
> >>>>
> >>>> If you fix the issue, kindly add following tag
> >>>> Reported-by: kernel test robot <oliver.sang@...el.com>
> >>> Ok, so this patch likely needs a v2 which changes the if to this:
> >>>
> >>> if (ret == -EINVAL && !IS_ERR_OR_NULL(fwnode) &&
> >>> !IS_ERR_OR_NULL(fwnode->secondary))
> >>> ret = fwnode_call_int_op(fwnode->secondary, get_reference_args,
> >>> prop, nargs_prop, nargs, index, args);
> >>>
> >>>
> >>> So that we check fwnode before dereferencing it, note this also changes the
> >>> (ret < 0) check to (ret == -EINVAL), this makes the secondary node handling
> >>> identical to fwnode_property_read_int_array() and
> >>> fwnode_property_read_string_array()
> >>>
> >>> Danny, can you send a v2 with this change please?
> >> Hmm... So, you are suggesting that we need to check it only for EINVAL and
> >> ENOENT in this case the one that brings us to the NULL pointer dereference.
> >> But I don't understand what's the difference here.
> >
> >
> > Sticking point; the ACPI version of .get_reference_args() returns
> > -ENOENT (converted from -EINVAL [1]) if the property you ask for doesn't
> > exist against that fwnode, which unless I'm missing something means this
> > won't work in our use case. This confused me for a while because we
> > definitely call fwnode_property_read_int_array() in sensor driver probes
> > through v4l2_fwnode_endpoint_alloc_parse(), but it turns out the ACPI
> > version of _that_ operation has no matching conversion of the error
> > code, so when that fails to find the property it sends back -EINVAL and
> > so the form that exists in fwnode_property_read_int_array() currently
> > works fine.
> >
> >
> > We could align them all to if (ret < 0 && !IS_ERR_OR_NULL(fwnode) &&
> > !IS_ERR_OR_NULL(fwnode->secondary)). This is probably my preferred
> > option, because I can't really see why we'd only want to do the
> > secondary check on -EINVAL anyway - but maybe I miss something here.
> > Alternatively we can take Hans suggestion so they all match the existing
> > code, but this means we have to handle that conversion first - I
> > couldn't see from a cursory look that any of the direct callers check
> > the value of the return beyond "is it 0?", but of course it could be
> > done somewhere in calls to the fwnode->ops->get_reference_args()
> > callback instead.
> >
> >
> > Thoughts?
>
> I missed that just checking for -EINVAL will not work for the ipu3 case
> (I did not test) in that case I think using "ret < 0" as check instead
> is probably fine for this patch.
>
> As for modifying the existing 2 code paths, IMHO it does make sense
> to try and preserve the error code (and not try the secondary fwnode)
> when the error is an error other then the one indicating the property
> is not there.
>
> So keeping those as -EINVAL is probably best and maybe for the
> the fwnode_find_reference instead of (ret < 0) use:
> (ret == -EINVAL || ret == -ENOENT) ?
Last time Sakari did a great job of error code alignments between DT, ACPI,
and SW nodes. Not sure why the above slipped through the fingers.
> >>>> [ 17.327851][ T7] BUG: kernel NULL pointer dereference, address: 00000000
> >>>> [ 17.329758][ T7] #PF: supervisor read access in kernel mode
> >>>> [ 17.331371][ T7] #PF: error_code(0x0000) - not-present page
> >>>> [ 17.332992][ T7] *pde = 00000000
> >>>> [ 17.334107][ T7] Oops: 0000 [#1] PREEMPT
> >>>> [ 17.335310][ T7] CPU: 0 PID: 7 Comm: kworker/u2:0 Tainted: G S 5.15.0-11191-g995fe757ecae #1
> >>>> [ 17.338036][ T7] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> >>>> [ 17.340544][ T7] Workqueue: events_unbound deferred_probe_work_func
> >>>> [ 17.342291][ T7] EIP: fwnode_property_get_reference_args (drivers/base/property.c:486 (discriminator 1))
> >>>> [ 17.344051][ T7] Code: 8b 45 0c 50 8b 45 08 50 89 d8 89 55 f4 ff d6 83 c4 0c 89 c6 85 c0 78 55 8d 65 f8 89 f0 5b 5e 5d c3 8d 74 26 00 be fa ff ff ff <8b> 03 85 c0 74 e8 3d 00 f0 ff ff 77 e1 8b 58 04 85 db 74 37 8b 5b
> >>>> All code
> >>>> ========
> >>>> 0: 8b 45 0c mov 0xc(%rbp),%eax
> >>>> 3: 50 push %rax
> >>>> 4: 8b 45 08 mov 0x8(%rbp),%eax
> >>>> 7: 50 push %rax
> >>>> 8: 89 d8 mov %ebx,%eax
> >>>> a: 89 55 f4 mov %edx,-0xc(%rbp)
> >>>> d: ff d6 callq *%rsi
> >>>> f: 83 c4 0c add $0xc,%esp
> >>>> 12: 89 c6 mov %eax,%esi
> >>>> 14: 85 c0 test %eax,%eax
> >>>> 16: 78 55 js 0x6d
> >>>> 18: 8d 65 f8 lea -0x8(%rbp),%esp
> >>>> 1b: 89 f0 mov %esi,%eax
> >>>> 1d: 5b pop %rbx
> >>>> 1e: 5e pop %rsi
> >>>> 1f: 5d pop %rbp
> >>>> 20: c3 retq
> >>>> 21: 8d 74 26 00 lea 0x0(%rsi,%riz,1),%esi
> >>>> 25: be fa ff ff ff mov $0xfffffffa,%esi
> >>>> 2a:* 8b 03 mov (%rbx),%eax <-- trapping instruction
> >>>> 2c: 85 c0 test %eax,%eax
> >>>> 2e: 74 e8 je 0x18
> >>>> 30: 3d 00 f0 ff ff cmp $0xfffff000,%eax
> >>>> 35: 77 e1 ja 0x18
> >>>> 37: 8b 58 04 mov 0x4(%rax),%ebx
> >>>> 3a: 85 db test %ebx,%ebx
> >>>> 3c: 74 37 je 0x75
> >>>> 3e: 8b .byte 0x8b
> >>>> 3f: 5b pop %rbx
> >>>>
> >>>> Code starting with the faulting instruction
> >>>> ===========================================
> >>>> 0: 8b 03 mov (%rbx),%eax
> >>>> 2: 85 c0 test %eax,%eax
> >>>> 4: 74 e8 je 0xffffffffffffffee
> >>>> 6: 3d 00 f0 ff ff cmp $0xfffff000,%eax
> >>>> b: 77 e1 ja 0xffffffffffffffee
> >>>> d: 8b 58 04 mov 0x4(%rax),%ebx
> >>>> 10: 85 db test %ebx,%ebx
> >>>> 12: 74 37 je 0x4b
> >>>> 14: 8b .byte 0x8b
> >>>> 15: 5b pop %rbx
> >>>> [ 17.350847][ T7] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: c37cd6d8
> >>>> [ 17.352783][ T7] ESI: ffffffea EDI: f5b5a400 EBP: c4cffd24 ESP: c4cffd14
> >>>> [ 17.354673][ T7] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010246
> >>>> [ 17.362075][ T7] CR0: 80050033 CR2: 00000000 CR3: 04206000 CR4: 00000690
> >>>> [ 17.363993][ T7] Call Trace:
> >>>> [ 17.365018][ T7] fwnode_find_reference (drivers/base/property.c:514)
> >>>> [ 17.366430][ T7] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67)
> >>>> [ 17.367825][ T7] ? lockdep_init_map_type (kernel/locking/lockdep.c:4813)
> >>>> [ 17.369325][ T7] ? phylink_run_resolve+0x20/0x20
> >>>> [ 17.370897][ T7] ? init_timer_key (kernel/time/timer.c:818)
> >>>> [ 17.372228][ T7] fwnode_get_phy_node (drivers/net/phy/phy_device.c:2986)
> >>>> [ 17.373574][ T7] phylink_fwnode_phy_connect (drivers/net/phy/phylink.c:1180 drivers/net/phy/phylink.c:1166)
> >>>> [ 17.375014][ T7] phylink_of_phy_connect (drivers/net/phy/phylink.c:1152)
> >>>> [ 17.376373][ T7] dsa_slave_create (net/dsa/slave.c:1889 net/dsa/slave.c:2036)
> >>>> [ 17.377765][ T7] dsa_tree_setup_switches (net/dsa/dsa2.c:477 net/dsa/dsa2.c:977)
> >>>> [ 17.379282][ T7] dsa_register_switch (net/dsa/dsa2.c:1065 net/dsa/dsa2.c:1565 net/dsa/dsa2.c:1579)
> >>>> [ 17.380762][ T7] dsa_loop_drv_probe (drivers/net/dsa/dsa_loop.c:333)
> >>>> [ 17.382137][ T7] mdio_probe (drivers/net/phy/mdio_device.c:157)
--
With Best Regards,
Andy Shevchenko
Powered by blists - more mailing lists