netdev - Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <493E7080.5060308@trash.net>
Date:	Tue, 09 Dec 2008 14:20:00 +0100
From:	Patrick McHardy <kaber@...sh.net>
To:	Jarek Poplawski <jarkao2@...il.com>
CC:	David Miller <davem@...emloft.net>, Martin Devera <devik@....cz>,
	netdev@...r.kernel.org
Subject: Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in	htb_dequeue()

Jarek Poplawski wrote:
> On Tue, Dec 09, 2008 at 01:25:15PM +0100, Patrick McHardy wrote:
>> Jarek Poplawski wrote:
>>> The algorithm is we want to "continue on the next jiffie". We know
>>> we've lost here a lot of time (~2 jiffies), and this will be added
>>> later. Since these jiffies are not precise enough wrt. psched ticks
>>> or ktime, and we will add around 2000 (for HZ 1000) psched ticks
>>> anyway this +1 here simply doesn't matter and can mean "a bit after
>>> q->now".
>> This might as well return q->now, no?
> 
> Yes, but IMHO it looks worse, considering the problem here (we want to
> avoid scheduling in the past).

I guess its a matter of taste.

>> The elapsed time is added
>> on top later anyways. But anyways, I think both the approach and
>> the patch are wrong.
>>
>> 	/* charge used jiffies */
>> 	start_at = jiffies - start_at;
>> 	if (start_at > 0)
>> 		next_event += start_at * PSCHED_TICKS_PER_SEC / HZ;
>>
>> What relationship does the duration it ran for has to the time it
>> should run at again?
> 
> The scheduling times won't be in the past mostly and hrtimers won't
> trigger too soon, but approximately around we really need and can
> afford a new try without stopping everything else.

Sure. But it also won't be in the past if we simply add .. lets say
the current uptime in ms. My point was that there's absolutely no
relationship between those two times and combining them just to
get a value thats not in the past is wrong. Especially considering
*why* we want a value in the future and what we'll get from that
calculation.

>> The focus on jiffies is wrong IMO, the reason why we get high
>> load is because the CPU can't keep up, delaying things even
>> longer is not going to help get the work done. The only reason to
>> look at jiffies is because its a cheap indication that it has
>> ran for too long and we should give other tasks a change to run
>> as well, but it should continue immediately after it did that.
>> So all it should do is make sure that the watchdog is scheduled
>> with a very small positive delay.
> 
> This needs additional psched_get_time(), and as I've written before
> there is no apparent advantage in problematic cases, but this would
> add more overhead for common cases.

htb_do_events() exceeding two jiffies is fortunately not a common
case. You (incorrectly) made the calculation somewhat of a common
case by also adding to the delay if the inner classes simply throttled
and already returned the exact delay they want.

Much better (again considering what we want to achieve here) would
be to not use the hrtimer watchdog at all. We want to give lower
priority tasks a chance to run, so ideally we'd use a low priority
task for wakeup.

>> As for the implementation: the increase in delay (the snippet
>> above) is also done in the case that no packets were available
>> for other reasons (throttling), in which case we might needlessly
>> delay for an extra jiffie if jiffies wrapped while it tried to
>> dequeue.
> 
> But in another similar cases there could be no change in jiffies, but
> almost a jiffie used for counting, so wrong schedule time as well.

Its not "wrong". We don't want to delay. Its a courtesy to the
remaining system.

> Approximatly this all should be fine, and it still can be tuned later.
> IMHO, this all should not affect "common" cases, which are expected to
> use less then jiffie here.

Jiffies might wrap even if it only took only a few nanoseconds.
And its not fine, in the case of throttled classes there's no
reason to add extra delay *at all*.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html