| FreeBSDCluster |
WebHome | UnixGeekTools | Geekfarm | About This Site
NOTE: this page is a work in progress...
My strategy
Here's the current implementation of my evolving strategy for high
availability at home with cheap PCs.
- three FreeBSD servers, each with different hardware
configurations. serverA is bigger/stronger/faster. serverB and
serverC are similar.
- serverA runs several FreeBSD Jails. The jails are mirrored via
GEOM gmirror. One of the gmirror consumers is a local disk in
serverA and the other resides in serverB and is mirrored via
ggate. Priority is assigned so that reads are performed on the
local disk. This ensures that the jails run at the speed of the
local disks. Previously all my jails were served over NFS which
worked great for a while but was quite a bit slower.
- The jails run all my public web services (e.g. web access, imap,
sshd, email, etc). In the event that serverA crashes, I simply
log in to serverB, stop ggate, fsck the drives, and bring up the
jails and their IPs on serverB. I haven't fully automated this
yet, but it can easily be done manually from off-site.
- serverC acts as a NAS, providing Samba and NFS. A number of
large hard drives are installed and are mirrored using GEOM. I
have no special hardware for shared storage. In the event that
serverC crashes, the hard drives are in kangaroo drives which
can be yanked out and popped into serverB. serverC provides no
other services except NAS to minimize the number of moving
parts.
- In addition to being a hot stand-by for either serverA or
serverC, serverB runs some lower priority services.
This has a number of advantages of being very cheap (since it requires
no special hardware), and being quite easy to build.
There are some limitations.
- jail roots are mounted over nfs which means all disk read/writes
run at network speeds rather than local disks. Since FreeBSD
currently lacks a cluster file system, this is probably the best
option at the moment.
- when moving the jail from one box to another, expect to wait a
few minutes for arp cache time-outs. It is possible to
eliminate the wait by generating gratuitous ARPs.
- there is not any automated failover yet. FreeVRRPD (or CARP +
ifstated) should probably do the trick.
Pointers
Other Cluster or Load-Balancing Tools for FreeBSD
- heartbeat - basic high-availability subsystem - /usr/ports/sysutils/heartbeat/
- FreeVRRPD
- pound - reverse proxy, load balancer and HTTPS front-end for Web server(s)
- ClusterIt - run commands on a series of UNIX systems
- keepalived - ipvs wrapper and a service health-checker
- LVS or Linux Virtual Server - highly scalable and highly available server built on a cluster
Fail Over Procedures
Here are the steps I use to move the jails from one machine to
another. It's not necessary, but as a matter of habit, I like to
reboot each machine when failing over the jails since that ensures I
got everything right and it will still work next time I reboot. Note
this is a work in progress.
- shut down jails on serverA -
sudo /etc/rc.d/jail stop
- delete ip aliases on serverA -
sudo ifconfig dc0 delete 192.168.1.98
- disable jails in serverA /etc/rc.conf - comment out jail_list and ifconfig aliases
- ensure that drives are properly sync'd via rsync
- enable jails in serverB /etc/rc.conf - uncomment jail_list and ifconfig aliases
- reboot serverB - ensure jails start properly
- reboot serverA - ensure jails and ips don't start
- move rsync job to serverB
Upgrade Procedures
Here are the steps I use to upgrade my cluster:
- first upgrade serverB and test it out
- shut down jails and move them to serverB and test
- upgrade serverA and test it out
- move jails back to serverA
- upgrade jails on serverA
If at any point along the way something goes wrong, you can very
easily fail back to a working configuration.
I keep my servers sync'd on the FreeBSD version, although when I
upgrade on box, I usually wait a couple of days to make sure
everything is running smoothly before upgrading the other box.
See also my FreeBSDUpgrade notes.
Jail files to exclude from Unison and rsync
When using Unison or rsync to keep a second (offline) copy of a jail
that is running live, some things should be excluded.
- cluster/jail/proc
- cluster/jail/dev
- cluster/jail/var/run/log
- cluster/jail/var/run/logpriv
Other cluster links
Filesystem synchronization options
- cluster filesystem to simultaneously mount the same disks
locally on multiple machines - none stable for freebsd
- rsync - cpu/disk intensive and too asynchronous. if serverA
crashes, serverB does not have an up-to-date copy of the jails.
- unison - see rsync.
Updated Sat Jan 20, 2007 9:41 PM