How do the counters for the status reports work?

The counters for a particular class (minute, second, hour) calculate a floating estimate when there is insufficient data to fill the entire window for the class. The data samples that are present are extrapolated to estimate the count for the period specified. Once there is sufficient data to cover the entire window the estimate becomes an accurate average of the samples in the window.

This is done by counting samples in two categories. One category counts the number of events and the other category counts the amount of time used to collect those events. Then in order to calculate the output the actual recorded data ( SumOfEvents / SumOfTime ) is extended to cover the amount of time represented by that class.

So, for the sake of discussion, if you are looking at status.minute data then there is enough space in the data window to hold 60 samples covering 60 seconds.

Approximately once per second, a new data sample is pushed into the pool containing two pieces of data. One piece is the number of events that has occurred. The other piece is the amount of time that has elapsed. This is done because the actual amount of time will not always be precisely one second --- In order to give message processing the highest priority, we do not strongly enforce the amount of time used to collect our samples.

So then how might the system give you an estimate of 18 messages per minute after you send one message through?

Suppose there were 3 samples in the data pool covering about 3 seconds and during one of these seconds you sent through a single message.

N1 = 0, N2 = 0, N3 = 1

T1 = 1100ms, T2 = 1108ms, T2 = 1092ms

Total events = 1
Total time = 3300 ms.

 

If there is one event in about 3 seconds the you might expect to see about 20 in 60 seconds right? More precisely:

1 / 3300ms = n / 60000ms,

1 * 60000 = n * 3300,

60000 / 3300 = 18.18181818, (Display 18 / minute)

 

In actuality, this theory is taken to a bit more of an extreme in the trend analysis in the SNF engine.

We wanted to provide a real-time trend analysis that would be more sensitive to recent data than to an arbitrary sliding window. This is so that administrators who are watching the system can get a better sense of what is happening moment to moment while still having a reasonable sense of accuracy over time. The effect is that if, for example, message rates are relatively constant then your trend data will be almost perfectly accurate. However, if your most recent data is trending upward or downward then your trend data will reflect a higher or lower (somewhat predictive) message rate. This is done by using a progression of data pools.

For example, in the case of capturing minute data there is a pool with 6 samples which is fed every second. Once every 6 seconds the sum of the data in that pool is fed into a data pool with 10 samples. Then the two pools together are used to provide rate data according to the algorithms described above.

If you want an absolutely accurate count for a given period of time then you can extract that from the log data by selecting entries from the log covering the period you are concerned with.

If you want a floating trend analysis then you can use the rates values. If you want a sample of the current data pools for the period in question then you can use the counters (* Counters are dependent upon the class in question... second class status data extrapolates data for minutes and hours, minute class status data extrapolates for hours, and so forth). For more information about the code that performs these calculations, please contact support.

In particular, the data pools are of the class snf_SMHDMY_Counter. One is used for tracking events per sample and another is used for tracking milliseconds per sample. An input() is posted approximately once per second and the data for each sample is summed automatically into data pools representing progressively larger collections of samples (periods of time).