Clusters

Multiple JVMs running Prudence, on several machines or on a single one, may share certain resources. The running instances do not have to be identical ("redundant"), though in some deployment scenarios they should be.

Many of the clustering features in Prudence rely on the excellent Hazelcast library: Hazelcast allows Prudence nodes to to share state and locks and to run tasks remotely. Its auto-discovery feature is especially useful for flexible cloud deployments.

You don't have to dig deep into Hazelcast: the core features work out-of-the-box with Prudence, and are fully explained in this chapter. However, if you're serious about clustering, it's a good idea to study Hazelcast and see what more, perhaps, it can do for you. Especially, you may want to tweak the default configuration, for example to add backups, change eviction policies, create node groups, enable SSL, encryption, etc.

Shared State

Easily share in-memory data between all nodes and applications in the cluster using the application.distributedGlobals and application.distributedSharedGlobals API family.

Concurrency

Like the other globals, distributed globals support concurrently atomic operations. In the following example, we make sure that the default value is only ever set once:

var defaultPerson = {name: 'Linus', role: 'boss'}
var person =
	Sincerity.JSON.from(application.getDistributedGlobal('person', Sincerity.JSON.to(defaultPerson)))

Serializability

Distributed globals must be serializable. If they aren't, you will get an exception when Hazelcast attempts to move the data between nodes.

If you're using JavaScript, this unfortunately means that you are limited to primitive types. One easy way to get around this is to serialize via JSON:

document.require('/sincerity/json/')
var person = {name: 'Linus', role: 'boss'}
application.distributedGlobals.put('person', Sincerity.JSON.to(person))
person = Sincerity.JSON.from(application.distributedGlobals.get('person'))
println(person.name)

Configuration

You can configure the distributed globals in "/configuration/hazelcast/application/globals.js", as a map named "com.threecrickets.prudence.distributedGlobals.[name]", where "name" is the application name. You can also change the name of that map in settings.js.

Cluster-Wide Synchronization

Use the application.getDistributedSharedLock API to synchronize access to resources for the entire cluster:

var lock = application.getDistributedSharedLock('services.remote')
lock.lock()
try {
	doSomethingAtomicallyWithRemoteService()
}
finally {
	lock.unlock()
}

This apparently simple tool offers a reliable and thus powerful guarantee for atomicity across entire deployments, even very large ones. It can replace the need to use a separate, dedicated synchronization tool, such as is provided Apache ZooKeeper. Just make sure you understand the detrimental effect its use could have for your scalability.

See the Hazelcast documentation for more information on distributed locks.

Task Farms

The distributed task APIs let you spawn tasks anywhere, everywhere, or on specific nodes in the cluster.

This powerful feature is only really useful in a heterogeneous cluster: if all your nodes are the same, and all of them are answering user requests behind a load balancer, then it's hard to see what advantage you would get by spawning background tasks on a node other than the one that answered the request. However, if you have a separate set of nodes specifically designated to running tasks, or even multiple sets dedicated to tasks of different kinds, then you gain considerable control over your deployment strategies. You can scale out your "web nodes" if you need to handle more user requests, and separately scale out your "task nodes," "cache nodes," "database nodes," etc. This deployment strategy is key to cost-efficient use of your resources.

Prudence supports two ways to create such a heterogeneous cluster.

Tagging Nodes

You can "tag" each node in your cluster with one or more custom strings. Many nodes may share the same tag, or you may create unique tags for some nodes in order to refer to them, and only them, directly.

To tag nodes, edit their "/configuration/hazelcast/default/default.js", and set the "com.threecrickets.prudence.tags" attribute to a comma-separated list of tags (every node can be associated to one or more tags). As an example, let's give our node two tags, "video-encoding" and "backup":

config.memberAttributeConfig.setStringAttribute('com.threecrickets.prudence.tags', 'video-encoding,backup')

We can then use the distributed task APIs to execute a single task on any one of the "video-encoding" nodes:

Prudence.Tasks.task({
	uri: '/reencode-video/,
	context: {filename: 'great_movie.avi', resolution: {w: 600, h: 400}},
	json: true,
	distributed: true,
	where: 'video-encoding'
})

Or, we can execute a backup operation on all the "backup" nodes, by setting the "multi" param to true:

Prudence.Tasks.task({
	uri: '/backup/,
	distributed: true,
	multi: true,
	where: 'backup'
})

Note that "where" can also be a comma-separated list.

Separate Cluster for Tasks

In a more complex deployment, you may want your task farm as an entirely separate Hazelcast cluster. For example, your task farm may be running in an entirely different data center, and you do not need or want it to share state with the application cluster.

By default, Prudence executes all tasks on one Hazelcast instance, but it allows you to configure a separate instance for tasks. To enable this scenario, Prudence comes with commented-out configurations for both the "task" nodes (the servers) and for the nodes that will be spawning the tasks (the clients):

"/configuration/hazelcast/task/1-server.js": Enable this if you want this node to be a full member of the "task" cluster, which means it will be able to accept and run distributed tasks. Note that the node would still be a member of the "application" cluster (unless you disable its configuration explicitly), meaning it will be able to share state with the application nodes.
"/configuration/hazelcast/task/2-client.js": Enable this if you want this node to be able to spawn tasks in the "task" cluster. This creates a lightweight HazelcastClient for the "tasks" cluster, which is not an actual member.

It normally doesn't make sense to have both "1-server.js" and "2-client.js" enabled on the same node: a server is already a full member, and doesn't have to also be a lightweight client. However, it can be useful to enable both for the purpose of testing the complete loop in a single container: it will work. (That's why there are numeric prefixes for the filenames: this makes sure they are initialized in the correct order if both are enabled.)

Once configured, you can use the task APIs as usual on both the client and server nodes, but the tasks will only run on the server nodes. Note that you can still use node tags for the server nodes by editing "1-server.js".

For more options for partitioning clusters in Hazelcast, see its grouping feature.

Shared Cache

Obviously, in a cluster you want to use a shared cache backend, and even implement a tiered caching strategy. The configuration chapter details your many built-in options.

Centralized Logging

There are four possible strategies for handling logging in a cluster, each with its own advantages:

Let every node keep its own log files, which is the default configuration in Prudence. This makes it easier to debug problems specific to each node. However, in a load-balancing scenario it can be very hard to follow user activities over time, because each request might be logged on a different node.
Centralize all logging. There are actually two ways to achieve this in Prudence:
- Log to a database. Sincerity's logging plugin comes with powerful support for logging to MongoDB.
- Run a dedicated logging node (a Apache Log4j server). This is again explicitly supported by Sincerity's logging plugin: it allows all your nodes to send their log messages to the logging node over the network. The logging node will be doing the actual message writing.
It's easy to log both locally and centrally, immediately giving you the benefits of both worlds, at the cost of some wasted resources due to the duplication.
A hybrid approach can be the best idea: some loggers might be stored locally, others might write to the central log.

Cluster-Wide Synchronization

Task Farms

Tagging Nodes

Separate Cluster for Tasks

Shared Cache

Centralized Logging

The Prudence Manual is provided for you under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The complete manual is available for download as a PDF.

PrudenceScalable REST PlatformFor the JVM

Clusters

Shared State

Concurrency

Serializability

Configuration

Cluster-Wide Synchronization

Task Farms

Tagging Nodes

Separate Cluster for Tasks

Shared Cache

Centralized Logging

Prudence
Scalable REST Platform
For the JVM