lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 17 Nov 2020 09:45:22 -0800
From:   Jacob Keller <jacob.e.keller@...el.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     netdev@...r.kernel.org, Jiri Pirko <jiri@...dia.com>,
        Michael Chan <michael.chan@...adcom.com>,
        Shannon Nelson <snelson@...sando.io>,
        Saeed Mahameed <saeedm@...dia.com>,
        Boris Pismenny <borisp@...dia.com>,
        Bin Luo <luobin9@...wei.com>
Subject: Re: devlink userspace process appears stuck (was: Re: [net-next]
 devlink: move request_firmware out of driver)



On 11/14/2020 8:10 PM, Jakub Kicinski wrote:
> On Fri, 13 Nov 2020 14:51:36 -0800 Jacob Keller wrote:
>> On 11/13/2020 2:32 PM, Jacob Keller wrote:
>>>
>>>
>>> On 11/13/2020 1:34 PM, Jacob Keller wrote:  
>>>> Well, at least with ice, the experience is pretty bad. I tried out with
>>>> a garbage file name on one of my test systems. This was on a slightly
>>>> older kernel without this patch applied, and the device had a pending
>>>> update that had not yet been finalized with a reset:
>>>>
>>>> $ devlink dev flash pci/0000:af:00.0 file garbage_file_does_not_exist
>>>> Canceling previous pending update
>>>>
>>>>
>>>> The update looks like it got stuck, but actually it failed. Somehow the
>>>> extack error over the socket didn't get handled by the application very
>>>> well. Something buggy in the forked process probably.
>>>>
>>>> I do get this in the dmesg though:
>>>>
>>>> Nov 13 13:12:57 jekeller-stp-glorfindel kernel: ice 0000:af:00.0: Direct
>>>> firmware load for garbage_file_does_not_exist failed with error -2
>>>>  
>>>
>>> I think I figured out what is going on here, but I'm not sure what the
>>> best solution is.
>>>
>>> in userspace devlink.c:3410, the condition for exiting the while loop
>>> that monitors the flash update process is:
>>>
>>> (!ctx.flash_done || (ctx.not_first && !ctx.received_end))
>>>   
>>
>> FWIW changing this to
>>
>> (!ctx.flash_done && !ctx.received_end)
>>
>> works for this problem, but I suspect that the original condition
>> intended to try and catch the case where flash update has exited but we
>> haven't yet processed all of the status messages?
> 
> Yeah... I've only looked at this for 5 minutes, but it seems that ice
> should not send notifications outside of begin / end (in fact it could
> be nice to add an appropriate WARN_ON() in notify())...
> 

What about just moving begin/end outside of drivers entirely? These two
functions don't take any arguments from the drivers... Doing the calls
in the core stack would make it impossible for a driver to do the wrong
thing, and would make it easier to handle the error cleanup routine.

Will send a patch to that effect soon for further discussion.

Thanks,
Jake

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ