
Background Tasks
The primary workload of a web platform is in handling user requests, and in Prudence these are handled in a configurable thread pool managed by the web server (usually Jetty). However, "primary" does not mean "only" or even "most": your application may be doing lots of other work both to serve users and to keep itself running properly. In Prudence, we call these "background tasks." They are run in a separate thread pool, and can even be "farmed out" in the cluster.
The common use cases are:
- Doing work in connection to user requests that can happen outside of requests: sending email notifications, updating a statistics database, etc. Even if this happens a bit later than the user request, the user experience would not suffer. By performing this work in the background, you can ensure that the request thread is freed as quickly as possible, which can go a long way towards improving your scalability.
- Doing work for users that cannot or should not be done within a request thread: encoding videos, interacting with 3rd-party services, performing long searches, etc. All of these could hang the request thread for far too long, which could severely limit your scalability. In these cases you would need to find some way to let the user know that the work they needed is done. Often, results are stored in a database. (Diligence's Progress Service is a generic tool for handling these scenarios.)
- Maintenance tasks unrelated to user requests: cleaning up idle sessions, pruning caches and unused results, sending out digest emails, etc. These tasks are often scheduled to be run on a regular basis: daily, every 5 minutes, etc.
In order to support these diverse use cases, Prudence allows for several ways to schedule and spawn background tasks.
Implementing Tasks
Prudence supports two ways to implement tasks. Note that in both cases, you may implement the task in any supported programming language: it doesn't matter which language calls the API and which language implements the task.
These tasks are executed from beginning to end. They can, of course, include libraries, define functions and classes, etc. You may optionally send the task an arbitrary "context," which will be accessible via the document.context API.
A simple example:
var count = document.context for (var i = 0; i < count; i++) { application.logger.info('#' + i) }
These tasks are loaded into memory once: Prudence will call a specified entry point every time the task is spawned. The "context" will be provided as an entry point argument. Additionally, entry points allow you to return a value to the caller.
A simple example:
function myEntryPoint(context) { var count = document.context for (var i = 0; i < count; i++) { application.logger.info('#' + i) } return 'finished' }
Generally, tasks implemented as entry points will be spawned faster. So, the rule of thumb should be:
- Implement your task as a program if it is supposed to run only once in a while.
- Implement your task as an entry point if it is frequently used.
APIs for Spawning and Scheduling
We'll discuss the higher-level API below. However, it's useful to start with the lower-level API, so you can better understand the many options.
Libraries
To spawn and/or schedule code in your application's "/libraries/" subdirectory, use the application.executeTask API. As a first example, let's spawn a program task:
application.executeTask( 'cms', '/tasks/hello/', null, {name: 'Michael'}, 0, 0, false)
Our program would be in "/libraries/tasks/hello.js":
application.logger.info('Hello, ' + document.context.name)
The first argument to executeTask is the application's name on the internal host. If you leave it as null, it would default to the current application. The second is the library URI. The third is the entry point name (not used in this example), and the fourth is the optional context.
The final three arguments are for scheduling:
- delay: Milliseconds before starting the task; zero means ASAP
- repeatEvery: Milliseconds after which the task will be repeated; zero means no repetitions
- fixedRepeat: Boolean; not used when "repeatEvery" is zero; if true, "repeatEvery" will be fixed according to the clock; if false, "repeatEvery" will be counted from when each task finishes executing
For our second example, let's use an entry point:
application.executeTask( null, '/tasks/hello/', 'sayHello', {name: 'Michael'}, 0, 0, false)
Our program with its entry point:
function sayHello(context) { application.logger.info('Hello, ' + context.name) }
Literal Scriptlet Code
To spawn and/or schedule literal scriptlet source code as a task, use the application.codeTask API. Note that the source code must provided as scriptlets, identical to the format of template resources:
application.codeTask( null, "<% application.logger.info('Hello, ' + document.context.name) %>", {name: 'Michael'}, 0, 0, false)
The arguments are similar to executeTask, except that literal source code is provided instead of a library name, and there is no entry point name.
This API is useful for generating the task's source code on demand.
The scriptlet format makes it possible to run tasks in any supported programming language:
application.codeTask( null, "<%python application.logger.info('Hello from Python, ' + document.context) %>", 'Michael', 0, 0, false)
Canceling Tasks
All the APIs return a JVM Future instance, which you can use to cancel the task:
var future = application.codeTask(...) future.cancel(true)
Return Values
As mentioned above, entry points can return values to the caller. This is also handled via the Future:
var future = application.codeTask(...) var r = future.get(500, java.util.concurrent.TimeUnit.MILLISECONDS)
Generally, it's not a very good idea to use the returned Future. The advantage of background tasks is in allowing you to release the current thread, but if you block waiting for a task to complete, then you will be doing the opposite. A better way to return values from background tasks store them in a database and attempt to fetch them later, as in the Diligence's Progress Service. However, if you keep the block times very short, the Future has its uses: for example, it provides an easy way to call code in one programming language from another.
Distributed Task APIs
When working with a Prudence clusters, you can spawn tasks on other nodes in the cluster. This feature enables you to easily create specialized task farms for flexible, scalable deployments.
The task APIs have distributed versions: application.distributedExecuteTask and application.distributedCodeTask. The difference is that the distributed versions don't have the three scheduling arguments: you can't delay or repeat distributed tasks. On the other hand, you have two extra arguments for optionally hinting where in the cluster you would want them executed:
- where: If this is a string, it is interpreted as a comma-separated list of node tags. When "multi" is false, it will select the first node that matches any of the required tags. When "multi" is true, it will select all nodes that match any of the tags. A null value here means to let Hazelcast decide: when "multi" is true, this would mean all nodes.
- multi: Boolean; when false, will spawn on only one member in the cluster; when true, will spawn on all members specified; note that when "multi" is true, the return value is a map of Hazelcast Member instances to their appropriate Future instances
In this example, we don't care when in the cluster the task will be executed once:
application.distributedCodeTask( null, "<% application.logger.info('Hello, ' + document.context.name) %>", {name: 'Michael'}, null, false)
In this example, we'll spawn a maintenance task on all nodes:
application.distributedExecuteTask( 'maintenance', '/tasks/cleanup/', null, 'now', null, true)
Note that, of course, the application "maintenance" as well as the library "/tasks/cleanup/" have to be present on all nodes in the cluster.
For distributed tasks, your sent contexts as well as your returned values must be serializable in order for them to be transferred over the network. If you're using JavaScript, this likely means sticking to primitive types: strings and numbers. However, you can serialize the data yourself: say, into JSON when spawning the task, and then from JSON in the task implementation. (The high-level API does this for you.)
High-Level API
If you're using JavaScript, you can use the Prudence.Tasks.task as a shortcut to all the APIs mentioned above. An example:
document.require('/prudence/tasks/') Prudence.Tasks.task({ uri: '/tasks/cleanup', application: 'maintenance' })
It comes with some sweet JavaScript sugar. For example, you can directly spawn functions:
function cleanup(context) { application.logger.info('Cleaning up: ' + context.time) } Prudence.Tasks.task({ fn: cleanup, context: {time: 'now'}, json: true, distributed: true })
Behind the scenes, the above actually serializes the function source code, and calls application.distributedCodeTask (so JavaScript stack closure won't work). The "json: true" param adds some useful magic: it will serialize the context, and then wrap code to deserialize the context around task code. So, the above will work just fine as a distributed task.
(While convenient, it's generally more efficient to invoke an entry point in a library than to serialize function code like so.)
Here's an example of blocking until we get a result:
var future = Prudence.Tasks.task({ uri: '/tasks/math/', entryPoint: 'multiply', context: [5, 6, 8], pure: true, block: '1s' }) print(future.get())
Note the "pure: true" param that forces the API to send the context as is: otherwise it will send it as a string to ensure support for serialization. (JavaScript data structures are not, unfortunately, serializable.)
In case you're curious, he's the task for that example:
function multiply(elements) { var r = 1 for (var e in elements) { r *= elements[e] } return r }
An Even Lower-Level API
If you have some knowledge of Java programming, you may access the task executor directly via the application.executor API.
Application crontab
Prudence supports crontab files that mimic the format used by that ubiquitous scheduling program.
This facility lets you schedule tasks to run at specific (Gregorian) calendrical times. It works similarly to calling application.codeTask with the repetition params, but allows for more succinct, calendrical repetition patterns. Also, the facility is always on, as long as your Prudence container is running: you do not have to call an API to enable it.
To use this facility, place a file with the name "crontab" in your application's base subdirectory. Each line of the file starts with scheduling pattern and ends with the task name. Empty lines and comments beginning with "#" are ignored. Example:
* * * * * /tasks/every-minute/ 59 23 * * tue,fri <% application.getSubLogger('scheduled').info('It is Tue or Fri, 11:59PM') %>
Notes:
- The crontab files will be checked for changes and parsed once per minute. This means that you can edit this file and have your task scheduling change on the fly without restarting Prudence.
- The scheduler does not check to see if a task finished running before spawning a new instance of it, so that even if a task is not done yet, but it's time for it to be spawned again, you'll have multiple instances of the task running at the same time. If this is problematic, consider using the task APIs instead, with the "fixedRepeat" param set to false.
Sending a Context
Optionally, you may add more text after the task name and whitespace: anything there is grabbed as a single string and sent as the context to the task, which can be accessed there using the document.context API. Because crontab is a text file, only textual contexts may be sent, but you can use JSON, XML or other encodings to create complex contexts.
For example:
* * * * * /tasks/every-minute/ {"message": "This is a JSON context"}
Scheduling Patterns
The scheduling pattern is a series of five settings separated by whitespace:
- Minutes of the hour, 0-59
- Hour of the day, 0-23
- Day of the month, 1-31; the special setting "L" signifies the last day of the month, which varies per month and year
- Month of the year, 1-12; three-letter English month names may be used instead of numbers: "jan", "feb", "mar", etc.
- Day of the week, 0-6; three-letter English day names may be used instead of numbers: "sun", "mon", "tue", etc.
The following rules apply:
- Any of these settings can be "*", signifying that every value would match: every minute, every hour, every day, etc.
- Use a slash to match only numbers that divide equally by the number after the slash (can also be used on "*")
- Ranges (inclusive) are possible, separated by hyphens
- Multiple values per setting are possible, separated by commas
- Multiple whole patterns are possible, separated by pipes ("|": a logical "or")
Note that you can schedule the same task on multiple lines, which is not equivalent to using the pipe: multiple lines means that multiple task instances might be spawned simultaneously if matched on more than one line. Contrarily, using the pipe counts as a single match.
Every minute:
* * * * *
11:59pm every Tuesday and Friday:
59 23 * * tue,fri
Every 5 minutes in the morning, between 5 to 8am, otherwise every 30 minutes:
*/5 5-7 * * *|*/30 0-4,8-23 * * *
The same as above, but with one added to all minutes of the hour:
1,6,11,16,21,26,31,36,41,46,51,56 5-7 * * *|1,31 0-4,8-23 * * *
System crontab
You can also set up a special crontab to run arbitrary Java static methods and non-JVM system processes, just like with the system cron, by creating a "/component/crontab" file.
Here's an example:
0 5 * * * sol.exe 0,30 * * * * OUT:C:\ping.txt ping 10.9.43.55 0,30 4 * * * "OUT:C:\Documents and Settings\Carlo\ping.txt" ping 10.9.43.55 0 3 * * * ENV:JAVA_HOME=C:\jdks\1.4.2_15 DIR:C:\myproject OUT:C:\myproject\build.log C:\myproject\build.bat "Nightly Build" 0 4 * * * java:mypackage.MyClass#startApplication myOption1 myOption2
The format is different from the application crontabs: see the cron4j documentation for complete details.
Like the application crontabs, it will be enabled as long as the Prudence container is running, and can be edited at runtime.
crontab APIs
For direct access to the crontab, use application.taskCollector for the current application's ApplicationTaskCollector, and application.scheduler for the component-wide cron4j Scheduler.
These APIs let you modify the crontab in memory. However, note that if you edit your crontab, the task table will be reset and reloaded, losing the changes you made via the API.
/startup/
It's often useful to schedule a task to be run as soon as the application starts: to initialize resources, turn on subsystems, do initial testing, etc.
Upon startup, Prudence will automatically spawn "/startup/" as a background task. So, you can create a file called "/libraries/startup.js", "/libraries/startup.py", "/libraries/startup/default.js", etc. For example, here's "/libraries/startup.js":
application.logger.info('Our application started!')
Tweaking
Learn how to configure the size of the thread pool for task APIs here.
For crontab configuration, see here.
The Prudence Manual is provided for you under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The complete manual is available for download as a PDF.