[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180416152429.529e3cba@gandalf.local.home>
Date: Mon, 16 Apr 2018 15:24:29 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Sasha Levin <Alexander.Levin@...rosoft.com>,
Pavel Machek <pavel@....cz>, Petr Mladek <pmladek@...e.com>,
"stable@...r.kernel.org" <stable@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Cong Wang <xiyou.wangcong@...il.com>,
Dave Hansen <dave.hansen@...el.com>,
Johannes Weiner <hannes@...xchg.org>,
Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>,
Peter Zijlstra <peterz@...radead.org>, Jan Kara <jack@...e.cz>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Byungchul Park <byungchul.park@....com>,
Tejun Heo <tj@...nel.org>, Greg KH <gregkh@...uxfoundation.org>
Subject: Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and
waiter logic to load balance console writes
On Mon, 16 Apr 2018 11:52:48 -0700
Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Mon, Apr 16, 2018 at 11:41 AM, Steven Rostedt <rostedt@...dmis.org> wrote:
> >
> >I never said the second
> > bug fix should not have been backported. I even said that the first bug
> > "didn't go far enough".
>
> You're still not getting it.
>
> The "didn't go far enough" means that the bug fix is *BUGGY*. It needs
> to be reverted.
It wasn't reverted. Look at the code in question.
Commit d63c7dd5bcb
+++ b/drivers/scsi/ipr.c
@@ -4003,13 +4003,12 @@ static ssize_t ipr_store_update_fw(struct device *dev,
struct ipr_sglist *sglist;
char fname[100];
char *src;
- int len, result, dnld_size;
+ int result, dnld_size;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
- len = snprintf(fname, 99, "%s", buf);
- fname[len-1] = '\0';
+ snprintf(fname, sizeof(fname), "%s", buf);
if (request_firmware(&fw_entry, fname, &ioa_cfg->pdev->dev)) {
dev_err(&ioa_cfg->pdev->dev, "Firmware file %s not found\n", fname);
The bug is that len returned by snprintf() can be much larger than 100.
That fname[len-1] = '\0' can allow a user to decide where to write
zeros.
That patch never got reverted in mainline. It was fixed with this:
Commit 21b81716c6bf
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -4002,6 +4002,7 @@ static ssize_t ipr_store_update_fw(struct device *dev,
struct ipr_sglist *sglist;
char fname[100];
char *src;
+ char *endline;
int result, dnld_size;
if (!capable(CAP_SYS_ADMIN))
@@ -4009,6 +4010,10 @@ static ssize_t ipr_store_update_fw(struct device *dev,
snprintf(fname, sizeof(fname), "%s", buf);
+ endline = strchr(fname, '\n');
+ if (endline)
+ *endline = '\0';
+
if (request_firmware(&fw_entry, fname, &ioa_cfg->pdev->dev)) {
dev_err(&ioa_cfg->pdev->dev, "Firmware file %s not found\n", fname);
return -EIO;
>
> > I hope the answer was not to revert the bug and put back the possible
> > bad memory access in to keep API.
>
> But that very must *IS* the answer. If there isn't a fix for the ABI
> breakage, then the first bugfix needs to be reverted.
It wasn't reverted and that was my point. It just wasn't a complete
fix. And I'm saying that once the API breakage became apparent, the
second fix should have been backported as well.
I'm not saying that we should allow API breakage to fix a critical bug.
I'm saying that the API breakage was really a secondary bug that needed
to be addressed. My point is the first fix was NOT reverted!
>
> Really. There is no such thing as "but the fix was more important than
> the bug it introduced".
I'm not saying that.
>
> This is why we started with the whole "actively revert things that
> introduce regressions". Because people always kept claiming that "but
> but I fixed a worse bug, and it's better to fix the worse bug even if
> it then introduces another problem, because the other problem is
> lesser".
>
> NO.
Right, but the fix to the API was also trivial. I don't understand why
you are arguing with me. I agree with you. I'm talking about this
specific instance. Where a bug was fixed, and the API breakage was
another fix that needed to be backported.
Are you saying if code could allow userspace to write zeros anywhere in
memory, that we should keep it to allow API compatibility?
>
> We're better off making *no* progress, than making "unsteady progress".
>
> Really. Seriously.
>
> If you cannot fix a bug without introducing another one, don't do it.
> Don't do kernel development.
Um, I think that's impossible. As the example shows. Not many people
would have caught the original fix would caused another bug. That
requirement would pretty much keep everyone from ever doing any kernel
development.
>
> The whole mentality you show is NOT ACCEPTABLE.
>
> So the *only* answer is: "fix the bug _and_ keep the API". There is
> no other choice.
I agree. But that that wasn't the question.
>
> The whole "I fixed one problem but introduced another" is not how we
> work. You should damn well know that. There are no excuses.
>
> And yes, sometimes that means jumping through hoops. But that's what
> it takes to keep users happy.
I'm talking about the given example of a simple memory bug that caused
a very subtle breakage of API, which had another trivial fix that
should be backported. I'm not sure that's what you were talking about.
-- Steve
Powered by blists - more mailing lists