Multiple ways to peel an orange

Use Cases

Datacenter system deployment (imaging/roll-out) & management
Workstation image management for any size office space
- Virtual or bare metal desktops running Windows or Linux without any modification to the OS
- Transparent to end-users, no performance loss
- Update one workstation, others can reboot to reprovision to the new version automatically
- Changes to OS can be eliminated on reboot; zero risk of tampering by end-users because of top-down policy enforcement by the network
Developer environments
- Create multiple netboot templates for testing developed software in many environments
- Temporarily (or permanently) roll-back template changes on a per-device basis, allowing bug verification against past platforms with less headaches
- Take developer focus away from VM lifecycle management to spend more time on real issues

Configuration Examples

Networks have different requirements because of various factors: financial considerations, scale of implementation, reliability, performance, and capacity.

Single-server

It is entirely possible (although less safe) to run all storage and compute services from just one node; but as of v0.9.3 there is no straightforward method for backups with a single node aside from running a virtual server on the same server that is connected to separate disks. External ZFS backup tools could be leveraged, but if they interact with Group or vDisk snapshots, you may want to reconsider.

If a user were to write a script using our API to do send/recv using database snapshots, it'd probably be added to our clusterducks/misc-scripts repository.

Multiple servers

When a network is configured, the first server is designated as the master server. This is done mainly to accomodate the use of filesystems that do not support native clustering (ZFS), thus the connection between the master and all slaves should be sized accordingly to avoid suffering long replication times with large dataset changes.

Though the relationship is described as master-slave, all slave nodes are active and support "read-only" OS images whose writes are discarded when the device reboots. This is different from the typical high availability configuration where a server is on standby until needed; on clusterducks networks, all resources are available whenever possible.

Group replication occurs via API request; there is a built-in scheduling interface, but it is in alpha stages.

vDisk replication checks are triggered every time the statistics collection cron job runs (every minute, or greater), though it will not send/recv unless needed; either the amount of data written to disk or the time elapsed since last snapshot (or both) exceed threshold.

Devices running on slave nodes still have access to persistent data from assigned vDisks. Replication between slave servers is planned for a future release.

Load Balancing
- Any thawed devices (OS volume is not reprovisioned during boot) will be redirected to the network master during PXE initialization to ensure that any changes to the OS image can be replicated to slaves appropriately.
- Devices in their default state (frozen: OS volume is reprovisioned during boot) can use a load balance mode to sort the list of attempted servers by a score that the panel calculates during statistics collection, based on metrics like CPU load, network and disk throughput, and the number of clients that are connected.
Round Robin
- A simple incremental approach to attempting servers is also available - the counter is stored per-device, so one devices' behaviour does not affect another. This mode may not result in balanced resource usage, however, for sites who frequently lose network access it may be the better choice as the Load Balancing approach contacts the control panel to retrieve updated server sorting scores.

In the Datacenter

clusterducks can run on commodity server offerings from any provider that offers servers with ECC memory; even Amazon Web Services (EC2).

Small or large networks can benefit; a single dedicated server with 16GB RAM and 250G storage is enough for a basic production deployment. Even with blades or distinct servers in a single rack, clusterducks can manage them as if they were a single system.

Across multiple datacenters / Point-of-Presence (PoP)

Multi-site networks have their own requirements.

Traditional clusterducks networks are physically (or logically) separate.

A single site can have multiple distinct networks
A single network spanning multiple sites may work, but is not trivial

For example, an organization may have one head office with many satellite offices. Bandwidth between sites must be sufficient to transfer image updates at an acceptable rate. Each site will need local storage servers.

Despite these requirements, management can be greatly simplified. Updates only needs to be done once to cover all devices in all sites.

Bandwidth between sites
- 100Mbit minimum is recommended
- Remember, as of v0.9.3 all OS image updates come from the master server
- Secondary servers may be the source for vDisk replication only
Image requirements - what do the sites do with their systems?
- If two sites have vastly different requirements, they should be configured as separate networks
- Each network generally should have two storage servers or more, although one will work, it presents a single point of failure
- Using Windows with a shared (golden) image requires the workstations have nearly identical hardware
  - Virtualizing your netboot environment is one way to avoid tight coupling between Windows and its underlying hardware
- Linux servers, whether virtual or physical, do not have this strict requirement and one image can cover multiple types of hardware
No live-migration support as of v0.9.3; VMs and bare metal must reboot for migration
Each site needs at least one storage server
With many satellite offices, the benefits of this approach do become more clear; smaller sites are better to operate separately