netdev - Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081209144534.GC15353@ff.dom.local>
Date:	Tue, 9 Dec 2008 14:45:34 +0000
From:	Jarek Poplawski <jarkao2@...il.com>
To:	Patrick McHardy <kaber@...sh.net>
Cc:	David Miller <davem@...emloft.net>, Martin Devera <devik@....cz>,
	netdev@...r.kernel.org
Subject: Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in
	htb_dequeue()

To David Miller: David don't apply yet - this patch needs change.

Patrick, read below:

On Tue, Dec 09, 2008 at 02:20:00PM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> On Tue, Dec 09, 2008 at 01:25:15PM +0100, Patrick McHardy wrote:
>>> Jarek Poplawski wrote:
>>>> The algorithm is we want to "continue on the next jiffie". We know
>>>> we've lost here a lot of time (~2 jiffies), and this will be added
>>>> later. Since these jiffies are not precise enough wrt. psched ticks
>>>> or ktime, and we will add around 2000 (for HZ 1000) psched ticks
>>>> anyway this +1 here simply doesn't matter and can mean "a bit after
>>>> q->now".
>>> This might as well return q->now, no?
>>
>> Yes, but IMHO it looks worse, considering the problem here (we want to
>> avoid scheduling in the past).
>
> I guess its a matter of taste.

Exactly. (And could be changed.)

>
>>> The elapsed time is added
>>> on top later anyways. But anyways, I think both the approach and
>>> the patch are wrong.
>>>
>>> 	/* charge used jiffies */
>>> 	start_at = jiffies - start_at;
>>> 	if (start_at > 0)
>>> 		next_event += start_at * PSCHED_TICKS_PER_SEC / HZ;
>>>
>>> What relationship does the duration it ran for has to the time it
>>> should run at again?
>>
>> The scheduling times won't be in the past mostly and hrtimers won't
>> trigger too soon, but approximately around we really need and can
>> afford a new try without stopping everything else.
>
> Sure. But it also won't be in the past if we simply add .. lets say
> the current uptime in ms. My point was that there's absolutely no
> relationship between those two times and combining them just to
> get a value thats not in the past is wrong. Especially considering
> *why* we want a value in the future and what we'll get from that
> calculation.
>
>>> The focus on jiffies is wrong IMO, the reason why we get high
>>> load is because the CPU can't keep up, delaying things even
>>> longer is not going to help get the work done. The only reason to
>>> look at jiffies is because its a cheap indication that it has
>>> ran for too long and we should give other tasks a change to run
>>> as well, but it should continue immediately after it did that.
>>> So all it should do is make sure that the watchdog is scheduled
>>> with a very small positive delay.
>>
>> This needs additional psched_get_time(), and as I've written before
>> there is no apparent advantage in problematic cases, but this would
>> add more overhead for common cases.
>
> htb_do_events() exceeding two jiffies is fortunately not a common
> case. You (incorrectly) made the calculation somewhat of a common
> case by also adding to the delay if the inner classes simply throttled
> and already returned the exact delay they want.

I see! You are right and this needs fixing.

> Much better (again considering what we want to achieve here) would
> be to not use the hrtimer watchdog at all. We want to give lower
> priority tasks a chance to run, so ideally we'd use a low priority
> task for wakeup.

I'm not sure I get ot right: for precise scheduling hrtimers look
useful. Do you really mean "at all"?
 
>
>>> As for the implementation: the increase in delay (the snippet
>>> above) is also done in the case that no packets were available
>>> for other reasons (throttling), in which case we might needlessly
>>> delay for an extra jiffie if jiffies wrapped while it tried to
>>> dequeue.
>>
>> But in another similar cases there could be no change in jiffies, but
>> almost a jiffie used for counting, so wrong schedule time as well.
>
> Its not "wrong". We don't want to delay. Its a courtesy to the
> remaining system.

In this case it's probably self-courtesy too: this ksoftirqd takes
most of the time and it's useless.

>
>> Approximatly this all should be fine, and it still can be tuned later.
>> IMHO, this all should not affect "common" cases, which are expected to
>> use less then jiffie here.
>
> Jiffies might wrap even if it only took only a few nanoseconds.
> And its not fine, in the case of throttled classes there's no
> reason to add extra delay *at all*.

Yes, you are right with this. I can try too fix this tomorrow, unless
you prefer to send your version of this patch.

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html