linux-kernel - Re: block: DMA alignment of IO buffer allocated from slab

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3d63a42f-837a-4bf6-665a-c3a8c8cb46e8@kernel.dk>
Date:   Tue, 25 Sep 2018 09:44:54 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Dave Chinner <david@...morbit.com>
Cc:     Christopher Lameter <cl@...ux.com>, Christoph Hellwig <hch@....de>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Ming Lei <tom.leiming@...il.com>,
        linux-block <linux-block@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Linux FS Devel <linux-fsdevel@...r.kernel.org>,
        "open list:XFS FILESYSTEM" <linux-xfs@...r.kernel.org>,
        Dave Chinner <dchinner@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Ming Lei <ming.lei@...hat.com>
Subject: Re: block: DMA alignment of IO buffer allocated from slab

On 9/25/18 1:49 AM, Dave Chinner wrote:
> On Mon, Sep 24, 2018 at 12:09:37PM -0600, Jens Axboe wrote:
>> On 9/24/18 12:00 PM, Christopher Lameter wrote:
>>> On Mon, 24 Sep 2018, Jens Axboe wrote:
>>>
>>>> The situation is making me a little uncomfortable, though. If we export
>>>> such a setting, we really should be honoring it...
> 
> That's what I said up front, but you replied to this with:
> 
> | I think this is all crazy talk. We've never done this, [...]
> 
> Now I'm not sure what you are saying we should do....
> 
>>> Various subsystems create custom slab arrays with their particular
>>> alignment requirement for these allocations.
>>
>> Oh yeah, I think the solution is basic enough for XFS, for instance.
>> They just have to error on the side of being cautious, by going full
>> sector alignment for memory...
> 
> How does the filesystem find out about hardware alignment
> requirements? Isn't probing through the block device to find out
> about the request queue configurations considered a layering
> violation?

Right now it isn't a stacked property, so answering the question
isn't even possible beyond "what does the top device require".

> What if sector alignment is not sufficient?  And how would this work
> if we start supporting sector sizes larger than page size? (which the
> XFS buffer cache supports just fine, even if nothing else in
> Linux does).

If sector alignment isn't sufficient, then we'd need to bounce 512b
formats... But I don't want to over-design something that isn't
relevant to real life setups. I'm not aware of anything that needs
memory aligned to that degree.

> But even ignoring sector size > page size, implementing this
> requires a bunch of new slab caches, especially for 64k page
> machines because XFS supports sector sizes up to 32k.  And every
> other filesystem that uses sector sized buffers (e.g. HFS) would
> have to do the same thing. Seems somewhat wasteful to require
> everyone to implement their own aligned sector slab cache...
> 
> Perhaps we should take the filesystem out of this completely - maybe
> the block layer could provide a generic "sector heap" and have all
> filesystems that use sector sized buffers allocate from it. e.g.
> something like
> 
> 	mem = bdev_alloc_sector_buffer(bdev, sector_size)
> 
> That way we don't have to rely on filesystems knowing anything about
> the alignment limitations of the devices or assumptions about DMA
> to work correctly...

I like that idea, would probably also need a mempool backing for
certain cases.

-- 
Jens Axboe