Turning Ceph RBD Images into SAN Storage Devices

RADOS Block Device (RBD) is a block-layer interface to the Ceph distributed storage stack. Here's how you can enhance RBD with SAN storage device compatibility, like iSCSI and Fibre Channel, to connect systems with no native RBD support to your Ceph cluster.

Prerequisites

What you'll need in order to accomplish SAN compatibility for your Ceph cluster is this:

  • A working Ceph cluster. You probably guessed this one. More specifically, you should have 
    • a RADOS pool in which you can create RBD images; the default rbd pool will do nicely. 
    • a set of credentials for a client to connect to the cluster, and create and map RBD devices. You can use the default client.admin credentials for this purpose, but I prefer to use a separate set of credentials for client.rbd.
    • that user should have at least the allow r permission on your mons, and allow rw on your osds (the latter you can restrict to the rbd pool if you wish).
  • A SCSI proxy node, which will act as an intermediary between your legacy initiators and the Ceph cluster. It should have
    • A sufficiently recent Linux kernel. 2.6.38 is the absolute minimum, but a post-3.2.0 kernel is highly recommended.
    • A working installation of the client tools required to map RBD devices (the rbd binary is the important one).
    • A copy of the credentials for your rbd client.
    • A working installation of the LIO and target tools (lio-utils and targetcli).
  • And finally, any number of clients supporting any of LIO's fabric modules. We'll use iSCSI in this example, but you could also use FibreChannel, FCoE, InfiniBand, and others.

Getting Started

The first thing we'll need to do is create an RBD image. Suppose we would like to create one that is 10GB in size (recall, all RBD images are thin-provisioned, so we won't actually use 10GB in the Ceph cluster right from the start).

rbd -n client.rbd -k /etc/ceph/keyring.client.rbd create --size 10240 test

This means we are connecting to our Ceph mon servers (defined in the default configuration file, /etc/ceph/ceph.conf) using the client.rbd identity, whose authentication key is stored in /etc/ceph/keyring.client.rbd. The nominal image size is 10240MB, and its name is a hardly creative test.

You can run this command from any node inside or outside your Ceph cluster, as long as the configuration file and authentication credentials are stored in the appropriate location. The next step, however, is one that you must complete from your proxy node (the one with the lio tools installed):

modprobe rbd
rbd --user rbd --secret /etc/ceph/secret.client.rbd map test

Note that this syntax applies to the current "stable" Ceph release, 0.48 "argonaut". Newer releases do away with the somewhat illogical --user and --secret syntax, and just allow --id and --keyring which is more in line with all other Ceph tools.

Once the map command has completed, you should see a new block device named /dev/rbd0 (provided this is the first device you mapped on this machine), and a handy symlink of the pattern /dev/rbd/<pool>/<image>, in our case /dev/rbd/rbd/test. This is a kernel-level block device like any other, and we can now proceed by exporting it to the Unified Target infrastructure.

Exporting the Target

Once we have our mapped RBD device in place, we can create a target, and export it via one of LIO's fabric modules. The targetcli subshell comes in very handy for this purpose:

# targetcli 
Welcome to the targetcli shell:

 Copyright (c) 2011 by RisingTide Systems LLC.

Visit us at http://www.risingtidesystems.com.

Loaded tcm_loop kernel module.
Created '/sys/kernel/config/target/loopback'.
Done loading loopback fabric module.
Loaded tcm_fc kernel module.
Created '/sys/kernel/config/target/fc'.
Done loading tcm_fc fabric module.
Can't load fabric module qla2xxx.
Loaded iscsi_target_mod kernel module.
Created '/sys/kernel/config/target/iscsi'.
Done loading iscsi fabric module.
Can't load fabric module ib_srpt.
/> cd backstores/iblock
/backstores/iblock> create test /dev/rbd/rbd/test
Generating a wwn serial.
Created iblock storage object test using /dev/rbd/rbd/test.
Entering new node /backstores/iblock/test
/backstores/iblock/test> status
Status for /backstores/iblock/test: /dev/rbd/rbd/liotest deactivated

Now we've created a backstore named test, corresponding to our mapped RBD image of the same name. At this point it is deactivated, as it hasn't been assigned to any iSCSI target.

Up next, we'll create the target, add the backstore as LUN 0, and assign the target to a Target Portal Group (TPG):

/backstores/iblock> cd ..
/backstores> cd ..
/> cd iscsi 
/iscsi> create
Created target iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557.
Selected TPG Tag 1.
Successfully created TPG 1.
Entering new node /iscsi/iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557/tpgt1
/iscsi/iqn.20...8ca9557/tpgt1> cd luns
/iscsi/iqn.20...57/tpgt1/luns> status
Status for /iscsi/iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557/tpgt1/luns: 0 LUN
/iscsi/iqn.20...57/tpgt1/luns> create /backstores/iblock/test 
Selected LUN 0.
Successfully created LUN 0.
Entering new node /iscsi/iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557/tpgt1/luns/lun0
/iscsi/iqn.20...gt1/luns/lun0> cd ..
/iscsi/iqn.20...57/tpgt1/luns> cd ..
/iscsi/iqn.20...8ca9557/tpgt1> cd portals 
/iscsi/iqn.20...tpgt1/portals> create 192.168.122.117
Using default IP port 3260
Successfully created network portal 192.168.122.117:3260.
Entering new node /iscsi/iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557/tpgt1/portals/192.168.122.117:3260

So now we have a new target, with the IQN iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557, assigned to a TPG listening on 192.168.122.117.

For demonstration purposes, we can now disable authentication and initiator filters. You should obviously not do this on a production system.

/iscsi/iqn.20....122.117:3260> cd ..
/iscsi/iqn.20...tpgt1/portals> cd ..
/iscsi/iqn.20...8ca9557/tpgt1> set attribute authentication=0
Parameter authentication is now '0'.
/iscsi/iqn.20...8ca9557/tpgt1> set attribute demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1
Parameter demo_mode_write_protect is now '0'.
Parameter generate_node_acls is now '1'.
Parameter cache_dynamic_acls is now '1'.
/iscsi/iqn.20...8ca9557/tpgt1> exit

There. Now we have an iSCSI target, a target portal group, and a single LUN assigned to it.

Using your new target

And now, you can just connect to this thin-provisioned, dynamically replicated, self-healing and self-rebalancing, snapshot capable, striped and distributed block device as you would to any other iSCSI target.

Here's an example for the Linux standard open-iscsi tools:

# iscsiadm -m discovery -p 192.168.122.117 -t sendtargets
192.168.122.117:3260,1 iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557
# iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557 -p 192.168.122.117 --login
Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557, portal: 192.168.122.117,3260]
Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557, portal: 192.168.122.117,3260]: successful

At this point, you'll have a shiny SCSI device showing up under lsscsi and in your /dev tree, and this device you can use for anything you please. Try partitioning it and making a filesystem on one of the partitions.

And when you're done, you just log out:

# iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557 -p 192.168.122.117 --logout
Logging out of session [sid: 1, target: iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557, portal: 192.168.122.117,3260]
Logout of [sid: 1, target: iqn.2003-01.org.linux-iscsi.gwen.i686:sn.7d9ed8ca9557, portal: 192.168.122.117,3260]: successful

That's it.

Where To Go From Here

Now you can start doing pretty nifty stuff.

Have a server that needs extra storage, but runs a legacy Linux distro with no native RBD support? Install open-iscsi and provide that box with the replicated, striped, self-healing, auto-mirroring capacity that Ceph and RBD come with.

Have a Windows box that should somehow start using your massive distributed storage cluster? As long as you have Microsoft iSCSI Initiator installed, you can do so in an all-software solution. Or you just get the iSCSI HBA of your choice, and use its Windows driver.

And if you have a server that can boot off iSCSI, you can even run a bare-metal install, or build a diskless system that stores all of its data in RBD.

Want High Availability for your target proxy? Can do. Pacemaker has resource agents both for LIO and for RBD. And highly-available iSCSI servers are a relatively old hat; we've been doing that with other, less powerful storage replication solutions forever.