Skip to main content

Planned and unplanned outages

Planned and unplanned outages

Applications become unavailable to users for two reasons: system maintenance and system crashes. In either situation, no loss of work should occur and work can continue to be processed.

Planned outage

In a planned outage, you know when application changes are taking place. For example, if you need to take a node out of service to increase heap size on the JVM, you can take that node out of service and move users to another node without users noticing any difference.

Quiesce

The process of quiescing provides the ability to take a Pega Platform™ server node out of service for maintenance or other activities.

To support quiescing, the node is first taken out of load balancer rotation. Passivation allows a requestor, service, or clipboard page to be saved into storage and reactivated later, helping to free up JVM memory. Passivation works at the page, thread, and requestor level. The inverse of passivation is activation. Activation brings the persisted data back into memory on another node.

You can quiesce a node from:

Quiescing a node with slow drain. The slow drain method requires removing the nodes from the load balancer before starting the quiesce process. When quiesced, the server looks for the accelerated passivation setting. By default, Pega Platform™ sets the passivation to five seconds. After five seconds, it passivates all existing user sessions. When users send another request, their user session activates in another Pega Platform™ server node without loss of information.

The five second passivation timeout might be too aggressive in some applications. System Administrators can increase the timeout value to reduce load. The timeout value should be large enough so that a typical user can submit a request.

Quiescing

Once all existing users are moved from the server, the server can be upgraded. Once the process is complete, the server is enabled in the load balancer and quiesce is canceled.

Quiescing a node with immediate drain. The default quiesce method is immediate drain, which does not require removing the nodes from the load balancer before starting the quiesce process.

High availability roles

Pega Platform™ provides two roles (PegaRULES:HighAvailabilityQuiesceInvestigator and PegaRULES:HighAvailabilityAdminstrator) that you can add to access groups for administrators who manage highly-available applications.

The High Availability Quiesce Investigator role lets administrators perform diagnostics or debug issues on a quiesced system. When quiesced, the system reroutes all users without this role.

The High Availability Administrator role gives administrators access to the High Availability landing pages. These users can also investigate issues on a quiesced system.

Out-of-place upgrade

The Pega Platform™ allows you to perform a rolling upgrade with little or no downtime. A rolling upgrade is also known as an out-of-place upgrade. An out-of-place, or parallel, upgrade involves creating a new rules schema, migrating rules from the old schema to the new schema, and upgrading the new schema to a new Pega release. Once the updates are complete, the DB connections are modified to point to the new rules schema and the nodes are quiesced and restarted one at a time in a rolling restart.

In-place upgrade

Pega Platform™ also provides the ability to perform in-place upgrades, which may involve significant downtime because existing applications need to be stopped. After, pre-upgrade scripts or processes may need to be run. Prior to importing the new version of the Pega rulebase, the database schema would be updated manually or automatically using the Installation and Upgrade Assistant (IUA). EAR or WAR files, if used, are undeployed and replaced with the new EAR and WAR files. The new archives would need to be loaded. After, additional configuration changes may be made using scripts or the Upgrade Wizard.

Unplanned outage

With Pega Platform™ configured for high availability, the application can recover from both browser and node crashes. Pega Platform™ uses dynamic container persistence for relevant work data. The dynamic container maintains UI and work states, but not the entire clipboard.

Node crash

Pega saves the structure of UI and relevant work metadata on shared storage devices for specific events. When a specific value is selected on a UI element, the form data is stored in the shared storage as a requestor clipboard page. When the load balancer detects a node crash, it redirects traffic to another node in the pool of active servers. The new node that processes the request detects a crash and uses the UI metadata to reconstruct the UI from the shared storage.

On redirection to a new node, the user must re-authenticate. Single Sign-on is not required for a high availability configuration, however a best practice is to use Single Sign-on to avoid user interruption and provide a better user experience. Since the user’s clipboard is not preserved from the crash, data that has been entered but not committed on assign, perform, and confirm harnesses is lost.

Browser crash

When the browser terminates or crashes, users connect to the correct server based on session affinity. The state of the user session is recovered without loss since the clipboard preserves both metadata for the UI and any data entered on screens and submitted.

Crash recovery matrix

Events Browser crash Node crash
UI is redrawn Yes Yes
User must re-authenticate

No, if redirected to the same node
Yes, if the authentication cookie was lost

No, with Single Sign-on
Yes, without Single Sign-on

Data entry loss

No, if redirected to the same node
Data not committed is lost if the authentication cookie was lost

Data not committed is lost

If you are having problems with your training, please review the Pega Academy Support FAQs.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice