If you're familiar with web site hosting on Amazon S3, which is a simple and cheap way to host a static web site, you might be wondering whether or not you can do the same in Ceph radosgw.

The short answer is you can't. Bucket Website is listed as Not Supported in the radosgw S3 API support matrix, and radosgw doesn't have index document support either.

But the longer answer is that you can, provided you use radosgw in combination with a front-end load-balancer — which, as it happens, can add a few more bells and whistles, as well. You could probably do the same thing with nginx, Varnish, or Apache in a mod_proxy_balancer balancer setup, but in this example configuration, we'll use HAProxy.

Getting started: the radosgw basics

Let's take look at a simple radosgw configuration with virtual host support, such that you can access your buckets as either http://ceph.example.com/bucketname or http://bucketname.ceph.example.com:

[client.rgw.radosgw01]
rgw_frontends = civetweb port=7480
rgw_dns_name = ceph.example.com
rgw_resolve_cname = True

Suppose we use s3cmd to upload an HTML file to this bucket, setting a public ACL:

s3cmd mb s3://testwebsite
s3cmd put --acl-public index.html s3://testwebsite/

Then if you exposed your radosgw to the web, any client (without authentication) would be able to retrieve http://testwebsite.ceph.example.com:7480/index.html with a web browser, or any other HTTP client application (such as curl or wget):

curl -I http://testwebsite.ceph.example.com:7480/index.html

Which would then return something like:

HTTP/1.1 200 OK
Content-Length: 18050
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 21:28:47 GMT
ETag: "b03130a4a1fc24df0f9f336f2b6d1d90"
x-amz-request-id: tx000000000000000005a88-0056a7b7eb-312df-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:16:11 GMT

Introducing HAProxy

Now let's start out with putting HAproxy in between. Nothing special there: radosgw listens on the conventional 7480 port, and we simply hand HAproxy traffic through there, and bind HAProxy itself to port 80.

global
    log         /dev/log local0
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats level admin

    # Default SSL material locations
    ca-base /etc/ssl/certs
    crt-base /etc/haproxy/ssl

    # Default ciphers to use on SSL-enabled listening sockets.
    # For more information, see ciphers(1SSL).
    ssl-default-bind-ciphers HIGH
    tune.ssl.default-dh-param 2048

defaults
    log global
    mode http
    option httplog
    option dontlognull
    retries 3
    timeout queue 1000
    timeout connect 1000
    timeout client 30000
    timeout server 30000
    option forwardfor


frontend ceph_front
    bind 0.0.0.0:80
    default_backend ceph_back

backend ceph_back
    balance source
    server radosgw01 127.0.0.1:7480 check

Index documents

So, the first thing we'll need to add is support for index documents. We'd like to make sure that when we retrieve https://testwebsite.ceph.example.com/, what's actually fetched from the backend is /index.html. We can do that by adding an HAproxy ACL that matches for the trailing slash in the path, and an http-request set-path directive that appends the index document name:

frontend ceph_front
    bind 0.0.0.0:80
    acl path_ends_in_slash path_end -i /
    # Append index document (index.html) to any path
    # ending in "/".
    http-request set-path %[path]index.html if path_ends_in_slash
    default_backend ceph_back

Now, that's fine in terms of getting the index document correctly:

curl -I http://testwebsite.ceph.example.com/index.html
HTTP/1.1 200 OK
Content-Length: 18050
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 21:28:47 GMT
ETag: "b03130a4a1fc24df0f9f336f2b6d1d90"
x-amz-request-id: tx000000000000000005a94-0056a7b9e3-312df-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:24:35 GMT

However, it of course breaks uploads and even bucket listings, or in other words, anything that uses the S3 API. Now you could test for some S3-specific headers in the request, but really, you should just check whether the request is authorized, and only apply the index document logic if it isn't, like so:

frontend ceph_front
    bind 0.0.0.0:80
    acl path_ends_in_slash path_end -i /
    acl auth_header hdr(Authorization) -m found
    # Append index document (index.html) to any path
    # ending in "/", unless the request has an auth header
    http-request set-path %[path]index.html if path_ends_in_slash !auth_header
    default_backend ceph_back

Great. Now we can upload using full paths without mangling, and on any un-authenticated requests, we substitute /index.html for any trailing /. In case you're wondering: yes, this works for any path, not just the root path.

Directory paths

However, you may also want something else, which is the ability to correctly handle a request like http://testwebsite.ceph.example.com/my/sub/directory, where of course you want the path /my/sub/directory translated into /my/sub/directory/index.html, which means we want to append a slash and an index document name to the request path.

So let's do that:

frontend ceph_front
    bind 0.0.0.0:80
    acl path_has_dot path_sub -i .
    acl path_ends_in_slash path_end -i /
    acl auth_header hdr(Authorization) -m found
    http-request set-path %[path]index.html if path_ends_in_slash !auth_header
    # Append trailing slash if necessary.
    http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
    default_backend ceph_back

Note that what we're doing here is somewhat crude. We're assuming that any actual file that we want to retrieve looks like name.ext, meaning it has a dot (period, full stop) character in it. The path_sub -i . expression in the path_has_dot ACL simply matches any path with . in it, and we're assuming that if a path has a dot then it points to a file, if it doesn't then it points to a directory.

You could be a little more clever here and use path_regex instead of path_sub for a full regular expression match. But regex lookups are slower than simple substring matches, so if the substring match works for you, go for it.

So now, we can do this:

s3cmd put --acl-public index.html s3://testwebsite/my/sub/directory/

And then:

# Note omitted trailing slash
curl -I http://testwebsite.ceph.example.com/my/sub/directory
HTTP/1.1 200 OK
Content-Length: 24235
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 23:57:04 GMT
ETag: "fecd005b33c0f6bfdee61b787cf54cb0"
x-amz-request-id: tx00000000000000000bc83-0056a7bd25-312cd-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:38:29 GMT

HTTPS support

So, what else might you want to do? One obvious thing that you can use HAproxy for is SSL termination. The radosgw embedded civetweb webserver can do that for you, but that feature is currently mildly broken in a rather curious way. So in order to allow HTTPS access to all your content via HAproxy instead, you would add:

frontend ceph_front_ssl
    bind 0.0.0.0:443 ssl crt ceph.pem no-sslv3 no-tls-tickets
    reqadd X-Forwarded-Proto:\ https
    acl path_has_dot path_sub -i .
    acl path_ends_in_slash path_end -i /
    acl auth_header hdr(Authorization) -m found
    http-request set-path %[path]index.html if path_ends_in_slash !auth_header
    http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
    default_backend ceph_back

But maybe you'd like to force, not merely allow, HTTPS access. redirect to the rescue:

frontend ceph_front
    bind 0.0.0.0:80
    reqadd X-Forwarded-Proto:\ http
    redirect scheme https code 301 if !{ ssl_fc }

frontend ceph_front_ssl
    bind 0.0.0.0:443 ssl crt ceph.pem no-sslv3 no-tls-tickets
    reqadd X-Forwarded-Proto:\ https
    acl path_has_dot path_sub -i .
    acl path_ends_in_slash path_end -i /
    acl auth_header hdr(Authorization) -m found
    http-request set-path %[path]index.html if path_ends_in_slash !auth_header
    http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
    default_backend ceph_back

And here we go:

# Note HTTP
curl -IL http://testwebsite.ceph.example.com/my/sub/directory
HTTP/1.1 301 Moved Permanently
Content-length: 0
Location: https://testwebsite.ceph.example.com/my/sub/directory
Connection: close

HTTP/1.1 200 OK
Content-Length: 24235
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 23:57:04 GMT
ETag: "fecd005b33c0f6bfdee61b787cf54cb0"
x-amz-request-id: tx00000000000000000bdeb-0056a7bf9b-312cd-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:48:59 GMT

Compression

And finally, maybe you'd like to speed up access to the stuff on your site. Why not add gzip on-the-fly-compression? It's supported by every browser worth its salt, and will make your users happier. You'll want to restrict compression to specific MIME types though. In the configuration below, we enable compression for plain text, HTML, XML, CSS, JavaScript, and SVG images.

frontend ceph_front
    bind 0.0.0.0:80
    reqadd X-Forwarded-Proto:\ http
    redirect scheme https code 301 if !{ ssl_fc }

frontend ceph_front_ssl
    bind 0.0.0.0:443 ssl crt ceph.pem no-sslv3 no-tls-tickets
    reqadd X-Forwarded-Proto:\ https
    acl path_has_dot path_sub -i .
    acl path_ends_in_slash path_end -i /
    acl auth_header hdr(Authorization) -m found
    http-request set-path %[path]index.html if path_ends_in_slash !auth_header
    http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
    compression algo gzip
    compression type text/html text/xml text/plain text/css application/javascript image/svg+xml
    default_backend ceph_back

Let's see how that helps us. Do a request without gzip encoding support, and observe that its total download size matches the document's Content-Length:

curl https://testwebsite.ceph.example.com/my/sub/directory > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24235  100 24235    0     0  94565      0 --:--:-- --:--:-- --:--:-- 94299

Now, add an Accept-Encoding header:

curl -H 'Accept-Encoding: gzip' https://testwebsite.ceph.example.com/my/sub/directory > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5237    0  5237    0     0  19243      0 --:--:-- --:--:-- --:--:-- 19324

There. Actual download size goes from 24KB down to just 5KB.

Where to go from here

There's a few additional features to be added here. You could enable CORS or HSTS, for example, and of course you could add more backends. But if you read this far, you surely get the idea.

And you're welcome to examine the headers you can pull from this page you're reading, wink wink. :)



Comments

comments powered by Disqus