lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <48CE6269.4050006@bull.net>
Date:	Mon, 15 Sep 2008 15:26:01 +0200
From:	Benjamin Thery <benjamin.thery@...l.net>
To:	David Miller <davem@...emloft.net>
Cc:	dada1@...mosbay.com, netdev@...r.kernel.org
Subject: Re: [PATCH 1/1] net: fix scheduling of dst_gc_task by __dst_free

David Miller wrote:
> From: Eric Dumazet <dada1@...mosbay.com>
> Date: Fri, 12 Sep 2008 16:46:52 +0200
> 
>> Benjamin Thery a écrit :
>>> The dst garbage collector dst_gc_task() may not be scheduled as we
>>> expect it to be in __dst_free().
>>> Indeed, when the dst_gc_timer was replaced by the delayed_work
>>> dst_gc_work, the mod_timer() call used to schedule the garbage
>>> collector at an earlier date was replaced by a schedule_delayed_work()
>>> (see commit 86bba269d08f0c545ae76c90b56727f65d62d57f).
>>> But, the behaviour of mod_timer() and schedule_delayed_work() is
>>> different in the way they handle the delay. mod_timer() stops the timer and re-arm it with the new given delay,
>>> whereas schedule_delayed_work() only check if the work is already
>>> queued in the workqueue (and queue it (with delay) if it is not)
>>> BUT it does NOT take into account the new delay (even if the new delay
>>> is earlier in time).
>>> schedule_delayed_work() returns 0 if it didn't queue the work,
>>> but we don't check the return code in __dst_free().
>>> If I understand the code in __dst_free() correctly, we want dst_gc_task
>>> to be queued after DST_GC_INC jiffies if we pass the test (and not in
>>> some undetermined time in the future), so I think we should add a call
>>> to cancel_delayed_work() before schedule_delayed_work(). Patch below.
>>>
>> Well, you are right that time is undetermined (but < ~120 seconds), so your patch
>> makes sense.
>>
>> Acked-by: Eric Dumazet <dada1@...mosbay.com>
> 
> I'll add this to net-next-2.6 for now.  Benjamin, do you know of any
> real cases where users are being tripped up by our not using the
> shorter scheduling of the workqueue?

I found this issue while tracking an issue that sometimes occurs at 
network namespace exit.

When a network namespace exits, the routes need to be freed as fast as 
possible to complete the unregistration of the net devices present in 
the namespace (ie. the loopback).

Sometimes, the routes garbage collection gets delayed (because of the 
issue described here) and the refcount on the device isn't decremented 
as expected when we reach netdev_wait_allrefs() and we get the infamous 
"unregister_netdevice: waiting for lo to become free."

This fix in __dst_free() fixes part of the problem.

Benjamin

> 
>> Then we should ask why we reset the timer back to its minimum value
>> every time we call __dst_free(). On machines with many dormant tcp
>> sessions, dst_garbage.list can contain huge number of non freeable
>> entries :(
>>
>> Maybe we should count the entries and change the timer only if really needed.
> 
> Yet another area of black magic in our routing cache :)
> 
>>> (Sorry, I think I've been a bit verbose to expose this simple issue :)
> 
> No, do not apologize, I wish every commit message were this verbose.
> 
> 


-- 
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

    http://www.bull.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ