lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 30 Nov 2021 20:49:06 -0800
From:   John Stultz <john.stultz@...aro.org>
To:     Bjorn Andersson <bjorn.andersson@...aro.org>
Cc:     Tadeusz Struk <tadeusz.struk@...aro.org>,
        Stanimir Varbanov <stanimir.varbanov@...aro.org>,
        Andy Gross <agross@...nel.org>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Lee Jones <lee.jones@...aro.org>,
        Amit Pundir <amit.pundir@...aro.org>,
        linux-media@...r.kernel.org, linux-arm-msm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] media: venus: Synchronize probe() between venus_core
 and enc/dec

On Tue, Nov 23, 2021 at 7:29 PM Bjorn Andersson
<bjorn.andersson@...aro.org> wrote:
>
> On Fri 29 Oct 14:48 PDT 2021, Tadeusz Struk wrote:
>
> > Venus video encode/decode hardware driver consists of three modules.
> > The parent module venus-core, and two sub modules venus-enc and venus-dec.
> > The venus-core module allocates a common structure that is used by the
> > enc/dec modules, loads the firmware, and performs some common hardware
> > initialization. Since the three modules are loaded one after the other,
> > and their probe functions can run in parallel it is possible that
> > the venc_probe and vdec_probe functions can finish before the core
> > venus_probe function, which then can fail when, for example it
> > fails to load the firmware. In this case the subsequent call to venc_open
> > causes an Oops as it tries to dereference already uninitialized structures
> > through dev->parent and the system crashes in __pm_runtime_resume() as in
> > the trace below:
> >
> > [   26.064835][  T485] Internal error: Oops: 96000006 [#1] PREEMPT SMP
> > [   26.270914][  T485] Hardware name: Thundercomm Dragonboard 845c (DT)
> > [   26.285019][  T485] pc : __pm_runtime_resume+0x34/0x178
> > [   26.286374][  T213] lt9611 10-003b: hdmi cable connected
> > [   26.290285][  T485] lr : venc_open+0xc0/0x278 [venus_enc]
> > [   26.290326][  T485] Call trace:
> > [   26.290328][  T485]  __pm_runtime_resume+0x34/0x178
> > [   26.290330][  T485]  venc_open+0xc0/0x278 [venus_enc]
> > [   26.290335][  T485]  v4l2_open+0x184/0x294
> > [   26.290340][  T485]  chrdev_open+0x468/0x5c8
> > [   26.290344][  T485]  do_dentry_open+0x260/0x54c
> > [   26.290349][  T485]  path_openat+0xbe8/0xd5c
> > [   26.290352][  T485]  do_filp_open+0xb8/0x168
> > [   26.290354][  T485]  do_sys_openat2+0xa4/0x1e8
> > [   26.290357][  T485]  __arm64_compat_sys_openat+0x70/0x9c
> > [   26.290359][  T485]  invoke_syscall+0x60/0x170
> > [   26.290363][  T485]  el0_svc_common+0xb8/0xf8
> > [   26.290365][  T485]  do_el0_svc_compat+0x20/0x30
> > [   26.290367][  T485]  el0_svc_compat+0x24/0x84
> > [   26.290372][  T485]  el0t_32_sync_handler+0x7c/0xbc
> > [   26.290374][  T485]  el0t_32_sync+0x1b8/0x1bc
> > [   26.290381][  T485] ---[ end trace 04ca7c088b4c1a9c ]---
> > [   26.290383][  T485] Kernel panic - not syncing: Oops: Fatal exception
> >
> > This can be fixed by synchronizing the three probe functions and
> > only allowing the venc_probe() and vdec_probe() to pass when venus_probe()
> > returns success.
> >
> > Changes in v2:
> > - Change locking from mutex_lock to mutex_trylock
> >   in venc_probe and vdec_probe to avoid potential deadlock.
> >
>
> Rather than trying to synchronize away the side effects of
> of_platform_populate() I think we should stop using it.
>
> I had the very same problem in the qcom_wcnss remoteproc driver and
> in below change I got rid of that by manually initializing a struct
> device for the child node. In the event that the child probe defer I
> would just probe defer the parent as well.
>
> 1fcef985c8bd ("remoteproc: qcom: wcnss: Fix race with iris probe")
>
> The change might look a little bit messy, but the end result it much
> cleaner than relying on various locks etc.
>
>
> But in the qcom_wcnss case I have a child _device_ because I need
> something to do e.g. regulator_get() on. I fail to see why venc and vdec
> are devices in the first place.

I definitely agree with Bjorn that all this asynchronous component
probing feels overly complicated, and a rework is probably the better
solution.

Though my only question is:  is someone planning to do this rework?

In the meantime, Tadeusz' patch does resolve a *very* frequent boot
crash seen when the venus driver is enabled.
So Stanimir, should we consider merging this as a stop gap until the
larger probe rework is done?

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ