lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Feb 2022 13:04:43 -0800
From:   Abhinav Kumar <quic_abhinavk@...cinc.com>
To:     Johannes Berg <johannes@...solutions.net>,
        <linux-kernel@...r.kernel.org>
CC:     <gregkh@...uxfoundation.org>, <rafael@...nel.org>,
        <robdclark@...il.com>, <dri-devel@...ts.freedesktop.org>,
        <linux-arm-msm@...r.kernel.org>, <freedreno@...ts.freedesktop.org>,
        <seanpaul@...omium.org>, <swboyd@...omium.org>,
        <nganji@...eaurora.org>, <aravindh@...eaurora.org>,
        <khsieh@...eaurora.org>, <daniel@...ll.ch>,
        <dmitry.baryshkov@...aro.org>
Subject: Re: [PATCH] devcoredump: increase the device delete timeout to 10
 mins

Hi Johannes

Thanks for the response.

On 2/8/2022 12:35 PM, Johannes Berg wrote:
> On Tue, 2022-02-08 at 11:44 -0800, Abhinav Kumar wrote:
>> There are cases where depending on the size of the devcoredump and the speed
>> at which the usermode reads the dump, it can take longer than the current 5 mins
>> timeout.
>>
>> This can lead to incomplete dumps as the device is deleted once the timeout expires.
>>
>> One example is below where it took 6 mins for the devcoredump to be completely read.
>>
>> 04:22:24.668 23916 23994 I HWDeviceDRM::DumpDebugData: Opening /sys/class/devcoredump/devcd6/data
>> 04:28:35.377 23916 23994 W HWDeviceDRM::DumpDebugData: Freeing devcoredump node
>>
>> Increase the timeout to 10 mins to accommodate system delays and large coredump
>> sizes.
>>
> 
> No real objection, I guess, but can the data actually disappear *while*
> the sysfs file is open?!
> 
> Or did it take 5 minutes to open the file?
> 
> If the former, maybe we should fix that too (or instead)?
> 
> johannes

It opened the file rightaway but could not finish reading.

The device gets deleted so the corresponding /data will disappear too ( 
as the data node is under devcd*/data)

60 static void devcd_del(struct work_struct *wk)
61 {
62 	struct devcd_entry *devcd;
63
64 	devcd = container_of(wk, struct devcd_entry, del_wk.work);
65
66 	device_del(&devcd->devcd_dev);
67 	put_device(&devcd->devcd_dev);
68 }

Are you suggesting we implement a logic like :

a) if the usermode has started reading the data but has not finished yet 
( we can detect the former with something like devcd->data_read_ongoing 
= 1 and we know it has finished when it acks and we can clear this flag 
then), in the timeout del_wk then we can delay the the delete timer by 
another TIMEOUT amount of time to give usermode time to finish the data?

b) If usermode acks, we will clear both the flag and delete the device 
as usual

But there is a corner case here:

c) If usermode starts the read, but then for some reason crashes, the 
timer will timeout and try to delete the device but will detect that 
usermode is still reading and will keep the device. How do we detect 
this case?

Thats why i thought maybe the easier way right now is to try increasing 
the timeout.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ