lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Apr 2007 10:46:07 +0200
From:	Jens Axboe <jens.axboe@...cle.com>
To:	Neil Brown <neilb@...e.de>
Cc:	Brad Campbell <brad@...p.net.au>,
	Chuck Ebbert <cebbert@...hat.com>,
	lkml <linux-kernel@...r.kernel.org>
Subject: Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

On Wed, Apr 25 2007, Neil Brown wrote:
> On Tuesday April 24, brad@...p.net.au wrote:
> > [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to 
> > lkml@...r.kernel.org
> > [105449.683646] cfq: busy=1,drv=0,timer=0
> > [105449.694871] cfq rr_list:
> > [105449.702715]   3108: sort=0,next=00000000,q=0/1,a=1/0,d=0/0,f=69
> > [105449.720693] cfq busy_list:
> > [105449.729054] cfq idle_list:
> > [105449.737418] cfq cur_rr:
> 
> Ok, I have a theory.
> 
> An ELEVATOR_FRONT_MERGE occurs which changes req->sector and calls
> ->elevator_merged_fn which is cfq_merged_request.
> 
> At this time there is already a request in cfq with the same sector
> number, and that request is the only other request on the queue.
> 
> cfq_merged_request calls cfq_reposition_rq_rb which removes the
> req from ->sortlist and then calls cfq_add_rq_rb to add it back (at
> the new location because ->sector has changed).
> 
> cfq_add_rq_rb finds there is already a request with the same sector
> number and so elv_rb_add returns an __alias which is passed to
> cfq_dispatch_insert. 
> This calls cfq_remove_request and as that is the only request present,
> ->next_rq gets set to NULL.
> The old request with the new sector number is then added to the
> ->sortlist, but ->next_rq is never set - it remains NULL.
> 
> How likely it would be to get two requests with the same sector number
> I don't know.  I wouldn't expect it to ever happen - I have seen it
> before, but it was due to a bug in ext3.  Maybe XFS does it
> intentionally some times?
> 
> You could test this theory by putting a
>    WARN_ON(cfqq->next_rq == NULL);
> at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.
> 
> I will leave the development of a suitable fix up to Jens if he agrees
> that this is possible.

That's pretty close to where I think the problem is (the front merging
and cfq_reposition_rq_rb()). The issue with that is that you'd only get
aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
here. Given that front merges are equally not very likely, I'd be
surprised is something like that has ever happened.

BUT... That may explain while we are only seeing it on md. Would md
ever be issuing such requests that trigger this condition?

I'll try and concoct a test case.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ