Ceph - howto, rbd, lvm, cluster
Install ceph
wget -q -O- https://raw.github.com/ceph/ceph/master/keys/release.asc | apt-key add - echo deb http://ceph.com/debian/ $(lsb_release -sc) main | tee /etc/apt/sources.list.d/ceph.list apt-get update && apt-get install ceph
Video to ceph intro
https://www.youtube.com/watch?v=UXcZ2bnnGZg http://www.youtube.com/watch?v=BBOBHMvKfyc&feature=g-high
Rebooting node stops everything / Set number of replicas across all nodes
Make sure that the min replica count is set to nodes-1.
ceph osd pool set <poolname> min_size 1
Then the remaing node[s] will start up with just 1 node if everything else is down.
Keep in mind this can potentially make stuff ugly as there are no replicas now.
More info here: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/10481
Add disks (OSD) or entire nodes
Prepare the disk as usual (partition or entire disk) - format with filesystem of choosing. Add to fstab and mount. Add to /etc/ceph/ceph.conf and replicate the new conf to the other nodes.
Start the disk, I'm assuming we've added osd.12 on ceph1 here.
## Auth stuff to make sure that the OSD is accepted into the cluser: mkdir /srv/ceph/osd12 ceph-osd -i 12 --mkfs --mkkey ceph auth add osd.12 osd 'allow *' mon 'allow rwx' -i /etc/ceph/keyring.osd.12 ## Create disk and start it ceph osd create osd.12 /etc/init.d/ceph start osd.12 ## Add it to the cluster and allow replicated based on CRUSH map. ceph osd crush set 12 osd.12 1.0 pool=default rack=unknownrack host=ceph1
In the line above, if you exchange the pool/rack/host you can place your disk/node where you want.
If you add a new host entry, it will be the same as adding a new node (with the disk).
Check that is in the right place with:
ceph osd tree
More info here:
- http://ceph.com/docs/master/rados/operations/pools/
- http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
Delete pools/OSD
Make sure you have the right disk, run
ceph osd tree
to get an overview.
Delete a OSD
ceph osd crush remove osd.12
Add Monitor node/service
Install ceph, add keys, ceph.conf, host files and prepare a storage for containing the maps.
Then add the monitor into the system (To keep quorum, keep either 1 or 3+ - not 2). Examples are adding monitor mon.21 with ip 192.168.0.68
cd /tmp; mkdir add_monitor; cd add_monitor ceph auth get mon. -o key > exported keyring for mon. ceph mon getmap -o map > got latest monmap ceph-mon -i 21 --mkfs --monmap map --keyring key > ceph-mon: created monfs at /srv/ceph/mon21 for mon.21 ceph mon add 21 192.168.0.68 > port defaulted to 6789added mon.21 at 192.168.0.68:6789/0 /etc/init.d/ceph start mon.21
Add the info to the ceph.conf file:
[mon] ... [mon.21] host = ceph2-mon mon addr = 192.168.0.68:6789 ...
Replicating from OSD-based to replication across hosts in a ceph cluster
More info here: http://jcftang.github.com/2012/09/06/going-from-replicating-across-osds-to-replicating-across-hosts-in-a-ceph-cluster/
Replication - see current level pr. OSD
ceph osd dump
CRUSH maps
Redistributing, [de]assembling and finetuning; more info here:
http://hpc.admin-magazine.com/Articles/RADOS-and-Ceph-Part-2
KVM - add disk
Pr host:
<disk type='network' device='disk'> <driver name='qemu' type='raw'/> <source protocol="rbd" name="test/disk2-qemu-5g:rbd_cache=1"> <host name='192.168.0.67' port='6789'/> <host name='192.168.0.68' port='6789'/> </source> <auth username='admin' type='ceph'> <secret type='ceph' uuid='7a91dc24-b072-43c4-98fb-4b2415322b0f'/> </auth> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk>
Pr pool:
<pool type='rbd'> <name>rbd</name> <uuid>f959641f-f518-4505-9e85-17d994e2a399</uuid> <source> <host name='192.168.0.67' port='6789'/> <host name='192.168.0.68' port='6789'/> <host name='192.168.0.69' port='6789'/> <name>test</name> <auth username='admin' type='ceph'> <secret type='ceph' uuid='7a91dc24-b072-43c4-98fb-4b2415322b0f'/> </auth> </source> </pool>
KVM - add secret/auth for use with ceph
Create a secret.xml file:
<secret ephemeral='no' private='no'> <uuid>7a91dc24-b072-43c4-98fb-4b2415322b0f</uuid> <usage type='ceph'> <name>admin</name> </usage> </secret>
Use it:
virsh secret-define secret.xml virsh secret-set-value 7a91dc24-b072-43c4-98fb-4b2415322b0f AQDAD8JQOLS9IxAAbox00eOmlM1h5ZLGPxHGHw==
The last key is the key from your /etc/ceph/keyring.admin
cat /etc/ceph/keyring.admin [client.admin] key = AQDAD8JQOLS9IxAAbox00eOmlM1h5ZLGPxHGHw==
Online resizing of KVM images (rbd)
Resize the desired block-image (here going from 30GB -> 40GB)
qemu-img resize -f rbd rbd:sata/disk3 40G > Image resized.
Find the attached target device:
virsh domblklist rbd-test > Target Source > ------------------------------------------------ > vdb sata/disk2-qemu-5g:rbd_cache=1 > vdc sata/disk3:rbd_cache=1 > hdc -
Then use virsh to tell the guest that the disk has a new size:
virsh blockresize --domain rbd-test --path "vdc" --size 40G > Block device 'vdc' is resized
Check raw rbd info
rbd --pool sata info disk3 > rbd image 'disk3': > size 40960 MB in 10240 objects > order 22 (4096 KB objects) > block_name_prefix: rb.0.13fb.23353e97 > parent: (pool -1)
Make sure you can see the change from dmesg (Guest should see the new size change).
dmesg > [...] > [75830.538557] vdb: detected capacity change from 118111600640 to 123480309760 > [...]
Then extend the partition - if it is a simple data volume, you can just fdisk, remove the old partition, create a new and access default values for start/end (Note, only applies to partitions which holds nothing else!)
Write the partition, fdisk -l to doublecheck the size, then remount the partition (the partition from above is mounted as a data dir under vdb1 in my case):
mount -o remount,rw /dev/vdb1
Check your fstab to make sure you get the correct options for the remount.
Afterwards, call the resize2fs:
resize2fs /dev/vdb1 > resize2fs 1.42.5 (29-Jul-2012) > Filesystem at /dev/vdb1 is mounted on /home/mirroruser/mirror; on-line resizing required > old_desc_blocks = 7, new_desc_blocks = 7 > The filesystem on /dev/vdb1 is now 28835584 blocks long.
doublecheck via df -h or the like.