linux-kernel - [Q] Bio traversal trouble?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4663DE68.5030200@gmail.com>
Date:	Mon, 04 Jun 2007 11:42:00 +0200
From:	Rene Herman <rene.herman@...il.com>
To:	Jens Axboe <axboe@...e.de>
CC:	Pekka Enberg <penberg@...helsinki.fi>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: [Q] Bio traversal trouble?

Hi Jens, list.

The new Mitsumi legacy CD-ROM driver seems largely done. Both audio and data 
work (and the driver's clean) but data does still have a problem that I think 
needs some input from a block person. I don't have any idea...

Audio is fine, and data basically as well. It's performing at maximum drive 
speed (and gets the correct data):

root@...5:~# dd if=/dev/mitsumi of=/dev/null bs=2k count=8k
8192+0 records in
8192+0 records out
16777216 bytes (17 MB) copied, 111.379 seconds, 151 kB/s

Unfortunately, I can make the box oops by just doing enough of those dd runs in 
a row (this one with bs=2k count=128, oopsed the seventh time or something):

===
BUG: unable to handle kernel paging request at virtual address 8c1d2071
  printing eip:
c10a6d6f
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: mitsumi nfsd exportfs lockd sunrpc nls_cp437 msdos fat nls_base
CPU:    0
EIP:    0060:[<c10a6d6f>]    Not tainted VLI
EFLAGS: 00010246   (2.6.21.3 #1)
EIP is at ioread8_rep+0x20/0x31
eax: 00010000   ebx: 00010300   ecx: 00000800   edx: 00000300
esi: c3145b30   edi: 8c1d2071   ebp: 8c1d2071   esp: c3145aec
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process dd (pid: 1770, ti=c3145000 task=c3114110 task.ti=c3145000)
Stack: 00000000 c3a7ad48 c486504d c4865ab7 00000006 50020034 c1015f7b 00000292
        00000000 00520300 00000100 0007e000 00000000 c3340300 c10a0531 00000000
        c12adf60 00000000 00000001 00000000 00000000 00000000 dead4ead ffffffff
Call Trace:
  [<c486504d>] __mitsumi_get_frame+0xc/0x16 [mitsumi]
  [<c4865ab7>] mitsumi_get_frame+0x120/0x134 [mitsumi]
  [<c1015f7b>] lock_timer_base+0x15/0x2f
  [<c10a0531>] cfq_set_request+0x0/0x144
  [<c114fefd>] _spin_unlock_irq+0x20/0x23
  [<c1033bda>] mempool_free+0x5f/0x64
  [<c4865b2d>] mitsumi_read+0x62/0x94 [mitsumi]
  [<c4865ba8>] mitsumi_request+0x49/0xb5 [mitsumi]
  [<c1099aae>] blk_start_queueing+0x14/0x1c
  [<c10971c2>] elv_insert+0xa3/0x13f
  [<c114fbc5>] _spin_lock_irq+0x2f/0x3a
  [<c109a577>] __make_request+0x29f/0x2d3
  [<c109a75b>] generic_make_request+0x136/0x146
  [<c1048433>] kmem_cache_alloc+0x93/0x9e
  [<c1033ae0>] mempool_alloc+0x32/0xcd
  [<c1033ae0>] mempool_alloc+0x32/0xcd
  [<c109a80f>] submit_bio+0xa4/0xaa
  [<c106736b>] bio_alloc_bioset+0xb2/0x112
  [<c1066dd1>] submit_bh+0xc2/0xdb
  [<c1065fc4>] block_read_full_page+0x245/0x252
  [<c1068346>] blkdev_get_block+0x0/0x36
  [<c114ffff>] _write_unlock_irq+0x20/0x23
  [<c103134f>] add_to_page_cache+0x71/0x78
  [<c10368f3>] read_pages+0x7c/0xc1
  [<c114ff7e>] _read_unlock_irq+0x20/0x23
  [<c114ff7e>] _read_unlock_irq+0x20/0x23
  [<c1025de0>] trace_hardirqs_on+0x11b/0x13e
  [<c1036a25>] __do_page_cache_readahead+0xed/0x107
  [<c1036b43>] blockable_page_cache_readahead+0x4d/0x9d
  [<c1036c17>] make_ahead_window+0x84/0xa5
  [<c1036d74>] page_cache_readahead+0x13c/0x15b
  [<c1031cbc>] file_read_actor+0x7f/0x102
  [<c1031953>] do_generic_mapping_read+0x110/0x3fa
  [<c1031ec6>] generic_file_aio_read+0x187/0x1b0
  [<c1031c3d>] file_read_actor+0x0/0x102
  [<c104abc3>] do_sync_read+0xbf/0xfc
  [<c101f2b8>] autoremove_wake_function+0x0/0x33
  [<c1070fe1>] dnotify_parent+0x1b/0x66
  [<c104aec2>] vfs_write+0xc9/0xff
  [<c1027210>] __lock_release+0x2d/0x3f
  [<c104ac87>] vfs_read+0x87/0xfd
  [<c104af39>] sys_read+0x41/0x67
  [<c1002490>] syscall_call+0x7/0xb
  =======================
Code: 00 00 89 c8 ef c3 0f c9 89 0a c3 57 3d ff ff 03 00 53 89 d7 89 c3 89 ca 77 
15 66 31 c0 3d 00 00 01 00 74 04 0f 0b eb fe 0f b7 d3 <f3> 6c eb 0a 4a 78 07 8a 
03 88 07 47 eb f6 5b 5f c3 57 3d ff ff
EIP: [<c10a6d6f>] ioread8_rep+0x20/0x31 SS:ESP 0068:c3145aec
===

This is obviously not good.

Since the last time this driver was posted it changed considerably and as one of 
the chances it's now requesting just one hardware sector ("frame") at a time 
from the drive as requesting multiple didn't actually work -- I seemed to have 
fouled up earlier tests somehow.

The driver could let the block layer handle just doing one frame at a time by 
completing only one frame (4 sectors) each time but this is not a good idea. 
Things are somewhat timing sensitive since if you're too late the next sector 
has passed below you (and if you're too early you're overrunning the drive) 
causing throughput to plummet.

I'd also simply like to understand it, so this is doing a manual bio traversal, 
requesting frames from the hardware as it goes along. The driver's main request 
function is:

static void mitsumi_request(struct request_queue *q)
{
         struct request *req;

         while ((req = elv_next_request(q))) {
                 struct bio *bio;
                 int sectors = 0;

                 if (!blk_fs_request(req)) {
                         end_request(req, 0);
                         continue;
                 }
                 if (rq_data_dir(req) != READ) {
                         printk(KERN_WARNING
                                "mitsumi: non-read request to CD\n");
                         end_request(req, 0);
                         continue;
                 }
                 spin_unlock_irq(q->queue_lock);
                 rq_for_each_bio(bio, req) {
                         unsigned int bytes;

                         bytes = mitsumi_read(req->rq_disk->private_data, bio);
                         sectors += bytes >> 9;

                         if (bytes != bio->bi_size)
                                 break;
                 }
                 spin_lock_irq(q->queue_lock);
                 if (!sectors || end_that_request_first(req, 1, sectors)) {
                         end_request(req, 0);
                         continue;
                 }
                 blkdev_dequeue_request(req);
                 end_that_request_last(req, 1);
         }
}

That is, it's looping around doing one bio at a time, completing the number of 
sectors that were done and failing the sector that had an error if any weren't. 
(I've had many variants of that end_request stuff in there, none helped with 
this problem).

mitsumi_read() is:

static unsigned int mitsumi_read(struct mitsumi_cdrom *mcd, struct bio *bio)
{
         sector_t frame = sector_to_frame(bio->bi_sector);
         unsigned int bytes = 0;
         struct bio_vec *bvec;
         int segno;

         bio_for_each_segment(bvec, bio, segno) {
                 unsigned char *dst;
                 unsigned int len;

                 dst = page_address(bvec->bv_page) + bvec->bv_offset;
                 for (len = bvec->bv_len; len; len -= CD_FRAMESIZE) {
                         if (mitsumi_get_frame(mcd, frame++, dst))
                                 goto out;

                         bytes += CD_FRAMESIZE;
                         dst += CD_FRAMESIZE;
                 }
         }
   out:
         return bytes;
}

bvec->bv_len is guaranteed to be a multiple of the hardware blocksize (the size 
set with blk_queue_hardsect_size) meaning that "len" loop is okay, right?

mitsumi_get_frame is the function that requests the frame from the hardware, 
possibly sleeps waiting for it to arrive and then reads the frame from I/O 
directly to "dst" ([edit] it _is_ legal to use complete() from an interrupt 
handler isn't it?)

Is there anything wrong in all this? The thing I could imagine is that it's not 
actually legal to drop the queue_lock, but if it's not legal to drop the lock 
where this does so now (back in mitsumi_request) it seems it wouldn't be legal 
to drop it anywhere since you'd only shorten the window by doing it elsewhere. 
If it's illegal period, this would mean this all needs a completely different 
setup it seems...

When it faults, it's (generally at least, I believe I saw it fault directly 
inside mitsumi_read once as well) at __mitsumi_get_frame, where it's trying to 
read the data from the drive to "dst", and "dst" seems corrupted. This in turn 
seems to be due to the entire bio being corrupted and the bio_for_each_segment() 
looping past the end.

The driver as it stands now is attached but most of it doesn't actually have 
anything to do with block, so I hope I've summarised it effectively above. Any 
hint you or anyone else reading this would be able to provide would be welcome.

Rene.

View attachment "mitsumi.diff" of type "text/plain" (26144 bytes)