lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 28 Sep 2021 07:41:00 +0000
From:   "Wang, Zhi A" <zhi.a.wang@...el.com>
To:     Zhenyu Wang <zhenyuw@...ux.intel.com>,
        Luis Chamberlain <mcgrof@...nel.org>
CC:     Christoph Hellwig <hch@....de>, Jason Gunthorpe <jgg@...dia.com>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
        Greg KH <gregkh@...uxfoundation.org>,
        "intel-gfx@...ts.freedesktop.org" <intel-gfx@...ts.freedesktop.org>,
        Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        Gerd Hoffmann <kraxel@...hat.com>,
        "Vivi, Rodrigo" <rodrigo.vivi@...el.com>,
        "intel-gvt-dev@...ts.freedesktop.org" 
        <intel-gvt-dev@...ts.freedesktop.org>,
        "Nikula, Jani" <jani.nikula@...el.com>
Subject: Re: refactor the i915 GVT support

Hey guys:

After some investigation, I found the root cause this problem ("i915" 
module loading will be stuck with Christoph's refactor patches), which 
can be reproduced by building both i915 and kvmgt as kernel module and 
the loading i915.

The root cause is: in Linux kernel loading, before a kernel module 
loading is finished, its symbols can not be reached by other module when 
resolving the symbols (even they can be found in /proc/kallsyms). 
Because the status of the kernel module is MODULE_STATE_COMING and 
resolve_symbol() from another kernel module will check this and return a 
-EBUSY.

In this case, before i915 loading is finished, the requested module 
"kvmgt" cannot reach the symbols in module i915. Thus it kept waiting 
and left message like below in the dmesg:

[  644.152021] kvmgt: gave up waiting for init of module i915.
[  644.152039] kvmgt: Unknown symbol i915_gem_object_set_to_cpu_domain 
(err -16)
[  674.871409] kvmgt: gave up waiting for init of module i915.
[  674.871427] kvmgt: Unknown symbol intel_ring_begin (err -16)
[  705.590586] kvmgt: gave up waiting for init of module i915.
[  705.590604] kvmgt: Unknown symbol i915_vma_move_to_active (err -16)
[  736.310230] kvmgt: gave up waiting for init of module i915.
[  736.310248] kvmgt: Unknown symbol shmem_unpin_map (err -16)
...

The error message is from execution path below:

kernel/module.c:

[i915 module loading] -> 
request_module("kvmgt")->[modprobe]->init_module("kvmgt")->load_module()->simplify_symbols()->resolve_symbol_wait():

static const struct kernel_symbol *
resolve_symbol_wait(struct module *mod,
             const struct load_info *info,
             const char *name)
{
     const struct kernel_symbol *ksym;
     char owner[MODULE_NAME_LEN];

     if (wait_event_interruptible_timeout(module_wq,
             !IS_ERR(ksym = resolve_symbol(mod, info, name, owner))
             || PTR_ERR(ksym) != -EBUSY,
                          30 * HZ) <= 0) {
         pr_warn("%s: gave up waiting for init of module %s.\n",
             mod->name, owner);

}

code: 
https://github.com/intel/gvt-linux/blob/bd950a66c7919d7121d2530f30984351534a96dc/kernel/module.c#L1452

In resolve_symbol_wait(), it calls resolve_symbol() to resolve the 
symbols in "i915". In resolve_symbol() -> ref_module() -> 
strong_try_module_get(), it will check the status of the module which 
owns the symbol.

static inline int strong_try_module_get(struct module *mod)
{
     BUG_ON(mod && mod->state == MODULE_STATE_UNFORMED);
     if (mod && mod->state == MODULE_STATE_COMING)
         return -EBUSY;
     if (try_module_get(mod))
         return 0;
     else
         return -ENOENT;
}

code:https://github.com/intel/gvt-linux/blob/bd950a66c7919d7121d2530f30984351534a96dc/kernel/module.c#L318

But unfortunately, this execution path begins in i915 module loading, at 
this time, the status of kernel module "i915" is MODULE_STATE_COMING 
until loading of "kvmgt" is finished. Thus a -EBUSY is always returned 
when kernel is trying to resolve symbols for "kvmgt".

This patch below might need re-work:

Author: Christoph Hellwig <hch@....de>
Date:   Wed Jul 21 17:53:38 2021 +0200

     drm/i915/gvt: move the gvt code into kvmgt.ko

     Instead of having an option to build the gvt code into the main i915
     module, just move it into the kvmgt.ko module.  This only requires
     a new struct with three entries that the main i915 module needs to
     request before enabling VGPU passthrough operations.

     This also conveniently streamlines the GVT initialization and avoids
     the need for the global device pointer.

     Signed-off-by: Christoph Hellwig <hch@....de>
     Signed-off-by: Zhenyu Wang <zhenyuw@...ux.intel.com>
     Link: 
http://patchwork.freedesktop.org/patch/msgid/20210721155355.173183-5-hch@lst.de
     Acked-by: Zhenyu Wang <zhenyuw@...ux.intel.com>

On 8/26/21 6:12 AM, Zhenyu Wang wrote:
> On 2021.08.20 12:56:34 -0700, Luis Chamberlain wrote:
>> On Fri, Aug 20, 2021 at 04:17:24PM +0200, Christoph Hellwig wrote:
>>> On Thu, Aug 19, 2021 at 04:29:29PM +0800, Zhenyu Wang wrote:
>>>> I'm working on below patch to resolve this. But I met a weird issue in
>>>> case when building i915 as module and also kvmgt module, it caused
>>>> busy wait on request_module("kvmgt") when boot, it doesn't happen if
>>>> building i915 into kernel. I'm not sure what could be the reason?
>>> Luis, do you know if there is a problem with a request_module from
>>> a driver ->probe routine that is probably called by a module_init
>>> function itself?
>> Generally no, but you can easily foot yourself in the feet by creating
>> cross dependencies and not dealing with them properly. I'd make sure
>> to keep module initialization as simple as possible, and run whatever
>> takes more time asynchronously, then use a state machine to allow
>> you to verify where you are in the initialization phase or query it
>> or wait for a completion with a timeout.
>>
>> It seems the code in question is getting some spring cleaning, and its
>> unclear where the code is I can inspect. If there's a tree somewhere I
>> can take a peak I'd be happy to review possible oddities that may stick
>> out.
> I tried to put current patches under test here: https://github.com/intel/gvt-linux/tree/gvt-staging
> The issue can be produced with CONFIG_DRM_I915=m and CONFIG_DRM_I915_GVT_KVMGT=m.
>
>> My goto model for these sorts of problems is to abstract the issue
>> *outside* of the driver in question and implement new selftests to
>> try to reproduce. This serves two purposes, 1) helps with testing
>> 2) may allow you to see the problem more clearly.
>>
> I'll see if can abstract that.
>
> Thanks, Luis.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ