lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b6618306-8de5-9b08-f2ee-9939df62aedc@nvidia.com>
Date:   Wed, 6 Dec 2017 17:18:02 +0000
From:   Jon Hunter <jonathanh@...dia.com>
To:     Guillaume Tucker <guillaume.tucker@...labora.com>,
        Ben Skeggs <bskeggs@...hat.com>
CC:     David Airlie <airlied@...ux.ie>, <dri-devel@...ts.freedesktop.org>,
        <linux-tegra@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini()
 before ->init()


On 06/12/17 09:22, Guillaume Tucker wrote:
> On 05/12/17 18:32, Ben Skeggs wrote:
>> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh@...dia.com> wrote:
>>
>>>
>>> On 04/12/17 18:37, Guillaume Tucker wrote:
>>>> If the firmware fails to load then ->fini() will be called before the
>>>> device has been initialised, causing the kernel to hang while trying
>>>> to write to a register.  Add a test in ->fini() to avoid this issue.
>>>>
>>>> This fixes a kernel hang on tegra124.
>>>>
>>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
>>>> Signed-off-by: Guillaume Tucker <guillaume.tucker@...labora.com>
>>>> CC: Ben Skeggs <bskeggs@...hat.com>
>>>> ---
>>>>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> index a3ba7f50198b..95e2aba64aad 100644
>>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>>>>  }
>>>>
>>>>  void
>>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
>>>> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>>>>  {
>>>> -     nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
>>>> +     struct nvkm_device *device = base->subdev.device;
>>>> +
>>>> +     if (base->subdev.oneinit)
>>>> +             nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>>>>  }
>>>>
>>>>  void
>>>
>>> I have tested this and it works for me. Thanks for fixing this! Would be
>>> good to get Ben's ACK, but you can have my ...
>>>
>> I'd love to get a good explanation as to why it hangs without this
>> change,
>> as, on the surface, it's not immediately obvious as to why it's hanging.
> 
> To be fair I'm not entirely sure either why this causes a hang, I
> haven't read the TRM...  The iomem has been mapped at this point,
> so accessing the register should work.  One clue is when you look
> at _bar1_init(), the 0x1704 register is initialised with
> some (device instance?) memory address.  So it's possible that
> the hardware does something special when you set this to 0 as in
> _bar1_fini(), which may fail in particular if it was previously
> not initialised with a valid address.
> 
> This is merely guesswork, would be interested to find out the
> real explanation though.

OK, well that's no good. It's a good pointer, but we need to make sure
we understand the root of this hang. I will see if I have sometime to
dig into this further, maybe next week.

Cheers
Jon

-- 
nvpublic

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ