lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 1 Dec 2006 12:03:44 -0800
From:	"Chen, Kenneth W" <kenneth.w.chen@...el.com>
To:	"'Chris Mason'" <chris.mason@...cle.com>
Cc:	"'Zach Brown'" <zach.brown@...cle.com>,
	"Andrew Morton" <akpm@...l.org>,
	"linux-kernel" <linux-kernel@...r.kernel.org>
Subject: RE: [rfc patch] optimize o_direct on block device

Chris Mason wrote on Friday, December 01, 2006 7:37 AM
> > It benefit from shorter path length. It takes much shorter time to process
> > one I/O request, both in the submit and completion path.  I always think in
> > terms of how many instructions, or clock ticks does it take to convert user
> > request into bio, submit it and in the return path, to process the bio call
> > back function and do the appropriate io completion (sync or async).  The
> > stock 2.6.19 kernel takes about 5.17 micro-seconds to process one 4K aligned
> > DIO (just the submit and completion path, less disk I/O latency).  With the
> > patch, the time is reduced to 4.26 us.
> 
> I'm not completely against a minimal DIO implementation for the block
> device, but right now we get block device QA for free when we test the
> rest of the DIO code.  Splitting the code base makes DIO (already a
> special case) that much harder to test.
> 
> It's obvious there's a lot less code in your patch than fs/direct-io.c,
> but I'm still interested in which part of the fs/direct-io.c path is
> taking the most time.  I would guess it is allocating the dio?
> 
> I don't think we should cut out fs/direct-io.c until we understand
> exactly where the hit is coming from.  I know you've done lots of
> instrumentation already, but can you share some percentages on the hot
> paths?

It's everywhere in the DIO submit and completion path, here is a profile
taken on a recent measurement:

                        Rank   %
__blockdev_direct_IO    9      3.09%
dio_bio_end_io          12     2.13%
dio_bio_complete        19     0.95%
finished_one_bio        34     0.55%
dio_get_page            76     0.22%
dio_bio_submit          96     0.19%
dio_bio_add_page       101     0.17%
dio_complete           115     0.16%
dio_new_bio            125     0.14%
dio_send_cur_page      150     0.10%
dio_bio_end_aio        201     0.06%

The compiler inlines direct_io_worker into __blockdev_direct_IO, so that
function showed up at the top.  The "rank" field indicates hotness of a
function, i.e., rank 1 is the hottest function. The "%" field is % clock
ticks for the respective function.

Looking at this profile, I see that the submit path is clearly heavy
weight. dio_bio_end_io maybe a bit skewed because of wake-up function
called inside spin_lock_irqsave().  The profiler is not capable of
measuring accurate clock ticks with code running inside irq off section.
Everything is accumulated at the time when irq is enabled.

- Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ