[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201002281322.05213.rjw@sisk.pl>
Date: Sun, 28 Feb 2010 13:22:05 +0100
From: "Rafael J. Wysocki" <rjw@...k.pl>
To: Ingo Molnar <mingo@...e.hu>
Cc: Stephen Rothwell <sfr@...b.auug.org.au>, mingo@...hat.com,
hpa@...or.com, linux-kernel@...r.kernel.org, roland@...hat.com,
suresh.b.siddha@...el.com, tglx@...utronix.de, hjl.tools@...il.com,
Andrew Morton <akpm@...ux-foundation.org>,
Linus <torvalds@...ux-foundation.org>
Subject: Re: linux-next requirements
On Sunday 28 February 2010, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <rjw@...k.pl> wrote:
>
> > On Saturday 27 February 2010, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <rjw@...k.pl> wrote:
> > >
> > > > > > Lets see. Over the last 60 days, I have reported 37 build errors. Of
> > > > > > these, 16 were reported against x86, 14 against ppc, 7 against other
> > > > > > archs.
> > > > >
> > > > > So only 43% of them were even relevant on the platform that 95+% of the
> > > > > Linux testers use? Seems to support the points i made.
> > > >
> > > > Well, I hope you don't mean that because the majority of bug reporters (vs
> > > > testers, the number of whom is unknown to me at least) use x86, we are free
> > > > to break the other architectures. ;-)
> > >
> > > It means exactly that: just like we 'can' break compilation with gcc296,
> > > ancient versions of binutils, odd bootloaders, can break the boot via odd
> > > hardware, etc. When someone uses that architectures then the 'easy'
> > > bugfixes will actually flow in very quickly and without much fuss
> >
> > Then I don't understand what the problem with getting them in at the
> > linux-next stage is. They are necessary anyway, so we'll need to add them
> > sooner or later and IMO the sooner the better.
>
> The problem is the dynamics and resulting (non-)cleanliness of code. We have
> architectures that have been conceptually broken for 5 years or more, but
> still those problems get blamed on the last change that 'causes' the breakage:
> the core kernel and the developers who try to make a difference.
>
> I think your perspective and your opinion is correct, while my perspective is
> real and correct as well - there's no contradiction really. Let me try to
> explain how i see it:
>
> You are working in a relatively well-designed piece of code which interfaces
> to the kernel in sane ways - kernel/power/* et al. You might break the
> cross-builds sometimes, but it's not very common, and in those cases it's
> usually your own fault and you are grateful for linux-next to have caught that
> stupidity. (i hope this a fair summary!)
Fair enough.
> I am not criticising that aspect of linux-next _at all_ - it's useful and
> beneficial - and i'd like to thank Stephen for all his hard work. Other
> aspects of linux-next useful as well: such as the patch conflict mediation
> role.
Great.
> But as it happens so often, people tend to talk more about the things that are
> not so rosy, not about the things that work well.
>
> The area i am worried about are new core kernel facilities and their
> development and extension of existing facilities. _Those_ facilities are
> affected by 'many architectures' in a different way from how you experience
> it: often we can do very correct changes to them, which still 'break' on some
> architecture due to _that architecture's conceptual fault_.
>
> Let me give you an example that happened just yesterday. My cross-testing
> found that a change in the tracing infrastructure code broke m32r and parisc.
>
> The breakage:
>
> /home/mingo/tip/kernel/trace/trace_clock.c:86: error: implicit declaration of function 'raw_local_irq_save'
> /home/mingo/tip/kernel/trace/trace_clock.c:112: error: implicit declaration of function 'raw_local_irq_restore'
> make[3]: *** [kernel/trace/trace_clock.o] Error 1
> make[3]: *** Waiting for unfinished jobs....
>
> Is was 'caused by':
>
> 18b4a4d: oprofile: remove tracing build dependency
>
> In linux-next this would be pinned to commit 18b4a4d, which would have to be
> reverted/fixed.
>
> Where does the _real_ blame lie? Clearly in the M32R and HP/PARISC code: why
> dont they, four years after it has been introduced as a core kernel facility
> in 2006, _still_ not support raw_local_irq_save()?
OK, I see your point.
> ( A similar situation occured in this very thread a well - before the subject
> of the thread - so it's a real and present problem. We didnt even get _any_
> reaction about that particular breakage from the affected architecture ... )
>
> These situations are magnified by how certain linux-next bugs are reported:
> the 'blame' is put on the new commit that exposes that laggy nature of certain
> architectures. Often the developers even believe this false notion and feel
> guilty for 'having broken' an architecture - often an architecture that has
> not contributed a single core kernel facility _in its whole existence_.
>
> The usual end result is that the path of least resistance is taken: the commit
> is reverted or worked around, while the 'laggy' architecture can continue
> business as usual and cause more similar bugs and hickups in the future ...
>
> I.e. there is extra overhead put on clearly 'good' efforts, while 'bad'
> behavior (parasitic hanging-on, passivity, indifference) is rewarded.
> Rewarding bad behavior is very clearly harmful to Linux in many regards, and i
> speak up when i see it.
>
> So i wish linux-next balanced these things more fairly towards those areas of
> code that are actually useful: if it ignored build breakages that are due to
> architectures being lazy - in fact if it required architectures to _help out_
> with the development of the kernel.
>
> The majority of build-bugs i see trigger in cross-builds (90% of which i catch
> before they get into linux-next) are of this nature, that's why i raised it in
> such a pointed way. Your (and many other people's) experience will differ - so
> you might see this as an unjustified criticism.
Thanks a lot for the clarification.
Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists