lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTinna7BiGHogXnn1iEG6ccUAjFM3p3S3aHpv=h-E@mail.gmail.com>
Date:	Sun, 7 Nov 2010 20:32:05 +0100
From:	Matt <jackdachef@...il.com>
To:	Milan Broz <mbroz@...hat.com>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	dm-devel@...hat.com, htd@...cy-poultry.org
Subject: Re: [PATCH] DM-CRYPT: Scale to multiple CPUs v3 on 2.6.37-rc* ?

On Sun, Nov 7, 2010 at 6:49 PM, Matt <jackdachef@...il.com> wrote:
> On Sun, Nov 7, 2010 at 3:30 PM, Milan Broz <mbroz@...hat.com> wrote:
>>
>> On 11/06/2010 11:16 PM, Matt wrote:
>>> before diving into testing out 2.6.37-rc* kernels I wanted to make
>>> sure that the patch:
>>>
>>> [dm-devel] [PATCH] DM-CRYPT: Scale to multiple CPUs v3
>>>
>>> is safe to use with e.g. >2.6.37-rc1 kernels
>>>
>>> I know that it's not a "fix all" patch but it significantly seems to
>>> speed up my backup jobs (by factor 2-3)
>>> and 2.6.37* has evolved that much that interactivity isn't hurt too much
>>
>> yes, it should work for the simple mappings without problems.
>>
>> I hope we will fix the patch soon to be ready for upstream.
>>
>> Milan
>>
>
> Hi Milan,
>
> thanks for your answer !
>
> Unfortunately I have to post a "Warning" that it's currently not safe
> (at least for me) to use it
>
> a few hours ago before 2.6.37-rc1 was tagged I already had shortly
> tested it with the dm-crypt multi-cpu patch and massive "silent" data
> corruption or loss occured:
>
> fortunately I don't/didn't see any data-corruption on my /home
> partition (yet) but everytime I boot into my system things are screwed
> up on the root-partition, e.g.:
>
> where
> eselect opengl list would show
>>Available OpenGL implementations:
>>  [1]   ati *
>>  [2]   xorg-x11
>
> normally it's
>>cat /etc/env.d/03opengl
>># Configuration file for eselect
>># This file has been automatically generated.
>>LDPATH="/usr/lib32/opengl/ati/lib:/usr/lib64/opengl/ati/lib"
>>OPENGL_PROFILE="ati"
>
>
> it currently says:
>
>>eselect opengl list
>>Available OpenGL implementations:
>>  [1]   ati
>>  [2]   xorg-x11
>
>
>>cat /etc/env.d/03opengl
>># Configuration file for eselect
>
> and another example was a corrupted /etc/init.d/killprocs
>
> so since this (a corrupted killprocs) already had happened in the past
> (last time due to a hardlock with fglrx/amd's catalyst driver) I
> thought it was some kind of system problem which could be fixed:
> I fired up a rebuild-job (emerge -e system) for my system and (surely)
> some other stuff disappeared - after 2-3 reboots I wanted to continue
> finishing the rebuild and gcc was gone (!)
>
> I don't have the time to re-test everything since this is my testing &
> production machine (I'll play back a system-backup tarball) but this
> didn't happen (yet) with 2.6.36 and
> the following patches related to multi-cpu dm-crypt:
>
> * [dm-devel] [PATCH] DM-CRYPT: Scale to multiple CPUs v3
> * [PATCH] Use generic private pointer in per-cpu struct
>
> so it seems to be safe.
>
> It has to be changes which got introduced with 2.6.37* which broke
> stuff. 2.6.36 seems to work perfectly fine with those 2 patches since
> several days already
>
> I'll stick with 2.6.36 for some time now
>
> Thanks !
>
> Matt
>

sorry - I forgot to include the most important part:

the system-partition is on an LVM/Volume Group that sits on an
cryptsetup partition so:

cryptsetup (luks) -> LVM/Volume Group (2 partitions, one of them
system - the other swap) -> system (ext4)

[cryptsetup -> LVM -> ext4-partition]

the mount-options were/are:

noatime,nodiratime,barrier=1

sometimes also

noatime,nodiratime,barrier=1,commit=600
(when the system runs for several hours to make it consume less
energy/write less)

the other settings are:

echo "3000" > /proc/sys/vm/dirty_expire_centisecs
echo "1500"  > /proc/sys/vm/dirty_writeback_centisecs
echo "15" > /proc/sys/vm/dirty_background_ratio
echo "50"   > /proc/sys/vm/dirty_ratio
echo "50" > /proc/sys/vm/vfs_cache_pressure

i/o scheduler: cfq

as already mentioned - this problem didn't appear or wasn't noticable
(yet) with or until 2.6.36 - my system-memory should also be
error-free (tested via memtest86), the harddisk too (previously tested
several times via badblocks without errors)

during every shutdown, reboot, hibernate, etc.

I do a manual:

sync && sdparm -C sync /dev/foo

to make sure data gets to the partition

I read about barrier-problems and data getting to the partition when
using dm-crypt and several layers so I don't know if that could be
related

Regards

Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ