Configuration options for RADIUS Client Load balancing with Authentication Manager version 8.5, or earlier
RSA Product Set: RSA SecurID RSA Product/Service Type: Authentication Manager RSA Version/Condition: 8.5.0 and earlier; 8.0 through 8.4 Platform: Linux O/S Version: SUSE Linux 12
Load Balancers such as F5 and Citrix NetScaler send various Keep-alive packets to RADIUS Servers and determine availability based on the responses to those Keep-alives. These keepalive packets can be as simple as a TCP port 3-Way SYN handshake, which is the beginning of any reliable network connection. The Load Balancer just needs a TCP port number, such as 1813 or 1812 for RADIUS.
LB sends SYNchronize → to AM server port 1813 ← SYNchronizeACKnowledgement AM server responds Acknowledgment →
A good Load Balancer would close this Connection after the 3rd Acknowledgement packet to save the AM server from having to time out the connection.
Keepalives can also be as complex as a full Authentication Request with UserID and Passcode. In order to be successful, the UserID and Passcode must be valid. A fixed Passcode can be used here, though it is less secure, and is not 2-factor authentication.
Depending on which keepalive method is selected, AM Admins need to be aware of scalability, reporting, and performance issues that could affect the AM servers. For example, a TCP port touch with the 3-way SYN, SYN-ACK, and ACK packet sequence Handshake have the least impact on the AM servers' resources. TCP SYN packets do not show in the RADIUS <date>.log files (e.g. /opt/rsa/am/radius/20210331.log for March 31, 2021) and do not show in the AM Authentication Real-Time Monitor or reports. Depending on which port is chosen, a response can indicate that either the AM server is up, or that the RADIUS Service on the AM server is up.
The most resource-intensive keepalive would be a full authentication Request with UserID and Passcode. These Keepalives would show in both the RADIUS <date>.logs and in the AM Authentication Real-Time Monitor or reports. The frequency of these Keepalives also comes into play.
Your tasks here are to:
Decided which Load Balancer approach scales best for your site/realm.
Configure one of the 5 approaches to RADIUS Client Load Balancing.
Depending on your requirements, you might configure User Test logons with a fixed passcode every 600 seconds or 5 minutes, with resulting log entries. Or you might configure a TCP port keepalive touch with no log impact. Be sure all firewall configurations limit access to AM Server TCP ports only to other AM servers, known Administrator workstations, and in this case, load balancers. As you will see in the Resolution, F5 provides a third approach, kind of a middle approach between the TCP port and full Authentication with UserID.
In General, you have Three approaches or options to RADIUS keep-alive:
TCP SYN to specific AM port. You might use TCP 1813 RADIUS accounting port, or 1812 as a simple way to determine that an AM RADIUS server is up and can be used in a RADIUS load balance. Refer to documentation from your RADIUS Client to determine specifics on how to configure this, but typically it's just a TCP port number and frequency (see Notes for frequency cautions). Minimal AM server impact with no AM configuration needed. Excellent Scalability. Note, when AM 8.6 is released around July-August or 2021, the Pulse SBR RADIUS currently used in AM 8.5 or earlier will be replaced by FreeRADIUS, which will use AM replication - therefore, TCP ports 1812 and 1813 will not be up and listening on an AM 8.6 servers by default (nor are TCP 1645 and 1646 alternate RADIUS ports).
A full Authentication Request with UserID and Fixed Passcode, such as the F5 RADIUS Monitor or Citrix NetScaler User logon for High Availability. The load balancer is looking for a response to indicate RADIUS is up and processing Authentication requests, even if the response is Access Denied. Image description
Maximum AM server impact with poor Scalability, so be careful with the frequency of these authentications. In a large network with thousands of RADIUS clients, you want to be careful that your High availability RADIUS test authentications do not make your AM RADIUS server unavailable by overusing the resources that were planned for real users. See Notes for Real World impact story * in Note2 below
UDP Monitor on F5 can send a null string to a UDP port such as 1812, which triggers reject response and puts an entry in RADIUS data log "Truncated authentication request" but shows nothing in the Authentication Manager Real-Time Monitor or Authentication reports. Image description
Scales better than full Authentication request, but you still need to be careful when dealing with thousands of RADIUS load balancers or frequencies less than every 5 minutes with a dozen or more Load balancers. This is a simple math problem, frequency of test times number of loadbalancers will tell you an idea of the load your testing is putting on the AM servers. Default on F5 is 300 seconds.
Note 1 - Load Balancers use the term 'stickiness' to refer to using the same servers from the Load Balance Pool for multi-step procedures. What this means is, Authentication is typically a single step, discreet action - the user sends UserID and Passcode to any AM server to authenticate, single response from an AM server completes the transaction. However, with New PIN mode or a PIN Change Policy in effect, some transactions become multi-step;
The user enters ID and Passcode (or tokencode if in New PIN mode)
AM server authenticates user, but includes prompt for something else, e.g. create New PIN or enter Next Token Code
This second response from the RADIUS client, with the new PIN, must go to the same AM server RADIUS that authenticated the user in steps a&b. This is the concept of stickiness. See Load Balancer documentation for configuration details.
Note 2 * Real World impact story A real-world example of frequency and keepalive impact was a customer support case where 9 Citrix NetScalers were configured to send authentication requests every 10-15 seconds. This resulted in over 92,000 (92 Thousand) authentication requests every 24 hours. The Netscalers were not configured with an actual UserID, they simply select the UserID 'test' with make-up or invalid Fixed Passcode. This meant every day the AM servers process 92,000 failed authentication requests for non-existent UserID test, all of which failed to resolve the User name after performing name lookups in both the Internal AM database and all external LDAP Identity sources. The Real-Time Authentication monitor was overflowing with these failed authentication messages, making it difficult to merely impossible to search for real authentication failures. The fact that the authentication failed did not matter, the Netscalers only needed response to maintain the AM servers in their active server list.
Note3 The TCP ports should not be accessible to any systems other than other RSA appliances, including proxies such as Load Balancers. In general, you want to protect all TCP ports, even from RADIUS clients, who only need to authenticate to UDP port 1812 or 1645. While a load balancer such as an F5 can use TCP port 1812 for keepalives, all Firewalls (and Load Balancers) should prevent pass-through access to these TCP ports.
...says of TCP port 1812, "[t]his port is used for communication between primary RADIUS and replica RADIUS services. If you do not use RSA RADIUS, but you have replica instances, you must allow connections between Authentication Manager instances on this port. You should restrict connections from other systems that are not Authentication Manager instances. For more information, see Required RSA RADIUS Server Listening Ports."