System offline and Critical errors

The reliability of your RefTracker system is as important to Altarama as it is to you!  Please let us know as soon as you notice your system has become unavailable, so that we can fix it promptly.

RefTracker is a web-based application and so is reliant on the Internet.  If your RefTracker system is not displaying, or you are seeing an error message, the issue may be related to Internet access between you and your RefTracker server.  Your RefTracker support representative can advise if your system is running so you can talk to your IT department if the problem is Internet access.

Usually, if there is a fundamental problem with your RefTracker system you will see a critical error message like this:

If you have a hosted RefTracker system, and a hardware, operating system or system software service such as SQL server becomes unavailable and does not automatically restart itself within 10 minutes, Altarama runs a monitoring program that automatically advises Altarama’s technical staff of the problem so that we can attend to fixing it. 

However this network monitoring does not catch instances where a critical error within RefTracker has taken an individual RefTracker system off line.  Critical errors are rare, but, in order to minimise occurrences, and minimise downtime when they do occur, RefTracker includes a number of features:
– Background processes automatically stop when a critical error is encountered.  This prevents processes occurring that expect RefTracker to be available to accept the result of the process – like email importing.
– RefTracker automatically “recovers” from a database connection issue in many situations.
– RefTracker automatically “recovers” from a critical error in the restart process, in most situations.
– If your system is hosted by Altarama, and a critical error occurs, a notification will be sent directly to Altarama staff.   Critical error notices are sent by email to both Altarama support and external email addresses monitored by Altarama support staff, if there is a Database connection error, a general critical error, or if 5 errors have occurred since the last recycle or critical error notification.  These emails are sent once every 10 minutes whilst any of these error situations remain present.  This functionality runs independent of your instance of RefTracker, and SQL server, so it will run when your system is in a critical error status.

If a Critical error does occur, Altarama staff can quickly restart your system manually, and that will almost always clear the error.

Details of the Critical email notifications sent directly to Altarama (for technical staff)


Database connection errors

The function that establishes the database connection will make 5 attempts to establish a connection.
The pause between attempts begins at 2 seconds and then doubles for each attempt
2 Seconds
4 Seconds
8 Seconds
16 Seconds
32 Seconds

So in total, this is 62 seconds – if a connection can’t be made after 62 seconds then the critical error function is called to set the Critical error flag


Database timeout

The standard timeout is 30 seconds, so any call to the database will wait 30 seconds before failing (default value – it can be extended using parameter 5.10)
We use longer timeouts on calls that we know can take a long time to complete – a long call has a 5 minute timeout (default value – can be extended by parameter 5.11).

For standard timeouts only (the short one), there are 6 attempts with an increasing pause between each

            ‘ Wait 1 second before attempt 2
            ‘ Wait 2 second before attempt 3
            ‘ Wait 3 second before attempt 4
            ‘ Wait 4 second before attempt 5
            ‘ Wait 5 second before attempt 6

This means that there can be up to 6 attempts that can each wait 30 seconds to timeout.

Network error

The same approach is taken for Network errors as is taken for Database connections errors as described above.



Other Critical errors

Most critical errors occur because the database is unavailable.
The other places where the critical error flag can be set are nearly all in the application initialisation – if RefTracker doesn’t have all the things that it needs to operate then a Critical error is set.

The Critical error flag is checked during the initialisation of every page that RefTracker displays.
If the flag is set then the user is redirected to SystemCriticalError.html (html files do not need ASP.NET to be operational in order to display).
The functions which initialise the application and the session are called by the pageinit if the initialisation has not yet completed.
This  means that the application can potentially ‘recover’.
The function which checks the critical error flag for each page cycle resets the critical flag after 1 minute.

The database connection error and all other critical errors have separate critical error flags.

The SystemCriticalError page includes an automatic refresh (every 60 seconds):
When the page refreshes it will redirect to default.aspx with a querystring value of ?criticalerror=1
Default.aspx will call the function which checks to see if there is a critical error (which will have turned off the critical error flag after x minutes)
If the critical error is off, then default.aspx will redirect the user to reft998 if there is a logged on session, otherwise to reft000
If the critical error is on, then default.aspx will redirect back to systemCriticalError and the cycle will begin again.

So if, for example, a database connection could not be made and the flag was set and then after a couple of minutes a connection could be established again, the user will be automatically redirected to a RefTracker page when the connection is reestablished.

The critical error notification application


For advising support about critical errors, there is a dedicated notification application that is self contained and does not depend on the ecosystem of other RefTracker functions (so can send error emails when RefTracker is not running).

This function will send an email to a number of Altarama and external email addresses monitored by Altarama support staff (controlled by the addresses in the to: and cc: lines of the settings/critical.xml file).
The email will be from the usual ‘from’ address for the instance if that is available, otherwise from reftrackerhelp@altarama.com

Setting where critical error emails are sent

The email addresses to which the critical error information is sent, are controlled by the config/settings/critical.xml file, as per the example shown below.
Critical error emails will not be sent if this file does not exist, or if the file exists but there are no values in <to>, <from>, and <server> – providing a way for critical error emails to be turned off.  We recommend that this file should be deleted for all in-house customers, or set to send to your IT department’s RefTracker support person.

If an in-house customer wants to use this critical error alerting functionality, they must amend the critical.xml file to specify their own SMTP server and their own IT staff email addresses.


Application error notifications

The Error trap notification will send an Application error email when 5 general errors have occurred.  This is in relation to errors that would have resulted in displaying the error page or an email to the Active system administrator, not Critical errors.

Here is an example of a message generated after 5 errors have occurred – it contains core details of the 5 errors.
After the email is sent the counter is reset – so another email will be sent when another 5 errors occurs as long as more than 10 minutes has elapsed.

The intention here (with all of these types of error notifications) is to alert Altarama to a problem – not necessarily to send details of every single error that has been logged.  The error logs hold the full details of all errors that occur, and other mechanisms send emails to your Active system administrator when errors that do not result in system unavailability, occur.


Background processing

RefTracker functions that are triggered by the background processing, check to see if the critical error flag has been set.
This means that a call to the background processing will trigger an attempted recovery.
It also means that the background processing will not attempt to do any work if the flag is set, ensuring that processes that deliver input to RefTracker such as email importing, do not run if RefTracker is not working.

Testing the error email alerting service/simulating errors (only needed by in-house customers)

A Database connection error can be simulated using reft400.aspx?simulateerror=1 or reft000.aspx?simulateerror=1

A Critical error can be simulated with reft400.aspx?simulateerror=2 or reft000.aspx?simulateerror=2

A General error trapping notification (890 page that staff see when an error occurs) can be triggered using reft400.aspx?raiseerror=1 – you need to do this 5 times to get to a batch of 5 errors that will trigger the notification