Alrighty folks, we’re hopefully about a week out from launching the full http://www.axwayuser.com site with all sorts of the cool functionality we’ve been talking about…so I’ll spare you more of the “We’re going to have…” details. Tonight I want to focus on an issue that’s been causing some trouble for us, and others. There is apparently an issue with all versions of the Secure Relay for the Synchrony Gateway product related to DNS Cacheing.

The Secure Relay functionality is fantastic, but because of an apparent Java issue, IP the XSR will hold onto IP addresses, so if you’re connecting to a remote site at xyz.com, and the main address is 192.26.172.101, but the server fails over to a passive node and xyz.com points to 192.26.172.102, your connection may file because of the cached IP Address in the XSR for xyz.com. Axway Support has pointed us towards a temporary fix while they work on a patch or update, but mentioned that a real patch could be a ways out since Java isn’t incredibly willing to change their functionality just because one company’s software is having trouble with it!

For the fix,  you’re actually changing Java security settings, not anything related to the Axway software. Here’s the pertinent information for making the fix. Please read carefully and research to make sure you’re not going to hurt other applications by doing this, as the changes to the Java settings will affect any application accessing Java on the machine!

Pertinent Info -

This change will have to be made to the agent that will be doing the lookup. On an outbound connection, the Master Agent is usually the agent to do the lookup, but if it is not resolvable, it should be passed along to the Router agent to be resolved. If the lookup is done on the Master first, and should be successful, you should only have to change this on the master. Keep in mind that this works in the opposite direction as well. If a connection comes in from a partner to your router, and we will also do a lookup on the spot to determine where we should be sending the connection internally. It should be safe to make this change to the Java on both the Router and the Master.

First, check to see which version of Java your Master and Router agents are running. For the Master agent, this can be determined by viewing the “p_xsr_master_java_home” set parameter in your Gateway’s run_time/etc/profile.bat file. For your Router Agent, you will want to check the “p_secure_relay_java_home” set parameter in the profile.bat file located in your Router Agent’s bin directory. Once you’ve obtained the appropriate Java for each installation, you will want to make the following changes to the security.java files located in the \lib\security folder of the appropriate java installation:

Change the following (default) lines from:

#networkaddress.cache.ttl=-1
networkaddress.cache.negative.ttl=10

To:

networkaddress.cache.ttl=0
networkaddress.cache.negative.ttl=0

The description of exactly what these parameters do is in the security file itself. I would recommend reviewing those details. As mentioned in the Java security file, you could also set the cache time to live to a period of time in seconds. This would allow an inbetween so we would not keep the IP forever, but not do a lookup every single time a transfer is made. If you set that to 300 seconds, you will fail outbound message to a partner who’s changed their IP for up to 5 minutes. If this is acceptable (the message should go out on a retry), it might be a good solution to optimize performance and still get your messages out in an automated manner.

Thanks go out to Axway Support for putting lots of research time into this. This is an issue that has plagued us because of trading partners who routinely change IP addresses, Use Load-Balancing, or need to do testing. I haven’t gotten a chance to implement this yet as going through change control and finding an appropriate outage will be tough, plus it’s difficult to test without doing a bunch of work, so I’m basically going to throw in the changes in QA, wait a while to make sure nothing breaks, and then implement in prod.

As a matter of best practice, I would recommend that anybody making this change does some benchmarking of both system, and TCP performance to ensure that the changes don’t affect performance drastically. Aside from ensuring that you’re not devastating your performance by making the change, this may also be a good way to test different values for the .ttl variables to see which provides the best balance of performance and functionality.

Best of luck!

Tony