[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080813074326.GB5367@ff.dom.local>
Date: Wed, 13 Aug 2008 07:43:26 +0000
From: Jarek Poplawski <jarkao2@...il.com>
To: Denys Fedoryshchenko <denys@...p.net.lb>
Cc: netdev@...r.kernel.org
Subject: Re: NMI lockup, 2.6.26 release
On Wed, Aug 13, 2008 at 10:28:11AM +0300, Denys Fedoryshchenko wrote:
> Just as proposal, maybe we can catch situation when "things going wrong" and
> panic? So we can forward some info to hrtimers guys?
> If it is hrtimers bug...
Yes, it would be the best, but I don't know how much I can "use" you
and your clients for debugging this. So, of course, if it's possible
you could simply edit this patch and try with increased values like
(100 * HZ) or (1000 * HZ), or even something like:
+ if (q->next_watchdog < q->now || next_event <=
+ q->next_watchdog - 10) {
Alas hrtimers guys didn't look like very interested, so the main
concern should be doing this optimal in net at least.
Jarek P.
>
> On Tuesday 12 August 2008, Jarek Poplawski wrote:
> > On Tue, Aug 12, 2008 at 02:31:40PM +0300, Denys Fedoryshchenko wrote:
> > ...
> >
> > > With second patch it works fine, 9 days uptime now
> >
> > Great! I didn't expect it would be so easy with this strange problem.
> > So, it looks like hrtimers could break probably after some
> > overscheduling. The only problem with this is to find some reasonable
> > limit which is both safe and doesn't harm resolution too much for
> > others.
> >
> > IMHO this second patch with 1 jiffie watchdog resolution looks
> > reasonable and should be acceptable, but it would be nice to check if
> > we can go lower. Here is "the same" patch with only change in
> > resolution (1/10 of jiffie). If there are any problems with testing
> > this please let me know. (It should be applied after reverting
> > patch #2.)
> >
> > Thanks,
> > Jarek P.
> >
> > (testing patch #3)
> > ---
> >
> > net/sched/sch_htb.c | 8 +++++++-
> > 1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> > index 30c999c..ff9e965 100644
> > --- a/net/sched/sch_htb.c
> > +++ b/net/sched/sch_htb.c
> > @@ -162,6 +162,7 @@ struct htb_sched {
> >
> > int rate2quantum; /* quant = rate / rate2quantum */
> > psched_time_t now; /* cached dequeue time */
> > + psched_time_t next_watchdog;
> > struct qdisc_watchdog watchdog;
> >
> > /* non shaped skbs; let them go directly thru */
> > @@ -920,7 +921,11 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
> > }
> > }
> > sch->qstats.overlimits++;
> > - qdisc_watchdog_schedule(&q->watchdog, next_event);
> > + if (q->next_watchdog < q->now || next_event <=
> > + q->next_watchdog - PSCHED_TICKS_PER_SEC / (10 * HZ)) {
> > + qdisc_watchdog_schedule(&q->watchdog, next_event);
> > + q->next_watchdog = next_event;
> > + }
> > fin:
> > return skb;
> > }
> > @@ -973,6 +978,7 @@ static void htb_reset(struct Qdisc *sch)
> > }
> > }
> > qdisc_watchdog_cancel(&q->watchdog);
> > + q->next_watchdog = 0;
> > __skb_queue_purge(&q->direct_queue);
> > sch->q.qlen = 0;
> > memset(q->row, 0, sizeof(q->row));
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists