netdev - Re: NMI lockup, 2.6.26 release

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080813074326.GB5367@ff.dom.local>
Date:	Wed, 13 Aug 2008 07:43:26 +0000
From:	Jarek Poplawski <jarkao2@...il.com>
To:	Denys Fedoryshchenko <denys@...p.net.lb>
Cc:	netdev@...r.kernel.org
Subject: Re: NMI lockup, 2.6.26 release

On Wed, Aug 13, 2008 at 10:28:11AM +0300, Denys Fedoryshchenko wrote:
> Just as proposal, maybe we can catch situation when "things going wrong" and 
> panic? So we can forward some info to hrtimers guys?
> If it is hrtimers bug...

Yes, it would be the best, but I don't know how much I can "use" you
and your clients for debugging this. So, of course, if it's possible
you could simply edit this patch and try with increased values like
(100 * HZ) or (1000 * HZ), or even something like:

+	if (q->next_watchdog < q->now || next_event <=
+	     q->next_watchdog - 10) {

Alas hrtimers guys didn't look like very interested, so the main
concern should be doing this optimal in net at least.

Jarek P.

> 
> On Tuesday 12 August 2008, Jarek Poplawski wrote:
> > On Tue, Aug 12, 2008 at 02:31:40PM +0300, Denys Fedoryshchenko wrote:
> > ...
> >
> > > With second patch it works fine, 9 days uptime now
> >
> > Great! I didn't expect it would be so easy with this strange problem.
> > So, it looks like hrtimers could break probably after some
> > overscheduling. The only problem with this is to find some reasonable
> > limit which is both safe and doesn't harm resolution too much for
> > others.
> >
> > IMHO this second patch with 1 jiffie watchdog resolution looks
> > reasonable and should be acceptable, but it would be nice to check if
> > we can go lower. Here is "the same" patch with only change in
> > resolution (1/10 of jiffie). If there are any problems with testing
> > this please let me know. (It should be applied after reverting
> > patch #2.)
> >
> > Thanks,
> > Jarek P.
> >
> > (testing patch #3)
> > ---
> >
> >  net/sched/sch_htb.c |    8 +++++++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> > index 30c999c..ff9e965 100644
> > --- a/net/sched/sch_htb.c
> > +++ b/net/sched/sch_htb.c
> > @@ -162,6 +162,7 @@ struct htb_sched {
> >
> >  	int rate2quantum;	/* quant = rate / rate2quantum */
> >  	psched_time_t now;	/* cached dequeue time */
> > +	psched_time_t next_watchdog;
> >  	struct qdisc_watchdog watchdog;
> >
> >  	/* non shaped skbs; let them go directly thru */
> > @@ -920,7 +921,11 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
> >  		}
> >  	}
> >  	sch->qstats.overlimits++;
> > -	qdisc_watchdog_schedule(&q->watchdog, next_event);
> > +	if (q->next_watchdog < q->now || next_event <=
> > +	     q->next_watchdog - PSCHED_TICKS_PER_SEC / (10 * HZ)) {
> > +		qdisc_watchdog_schedule(&q->watchdog, next_event);
> > +		q->next_watchdog = next_event;
> > +	}
> >  fin:
> >  	return skb;
> >  }
> > @@ -973,6 +978,7 @@ static void htb_reset(struct Qdisc *sch)
> >  		}
> >  	}
> >  	qdisc_watchdog_cancel(&q->watchdog);
> > +	q->next_watchdog = 0;
> >  	__skb_queue_purge(&q->direct_queue);
> >  	sch->q.qlen = 0;
> >  	memset(q->row, 0, sizeof(q->row));
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html