I have to confess that, for the longest time, I had no idea that an ordinary NoteDatabase.Open() call is not cluster-aware. Clustering in Lotus Notes and Domino, after all, is magic, right? But the ordinary open(server,path) method in LotusScript and Java back-end classes does not fail over if the specified server is not responding There is a special openWithFailover(server,path) method, and that's what you're supposed to use if you want failover to occur. I can see the logic in this, as there are many cases where you really do have to carefully control what server you are working with.
So a few weeks ago I started thinking about upgrading some code to use openWithFailover, but this code runs in an unusual configuration (scheduled agent running on a server in domainA accessing databases on servers in domainB), so I decided to do some methodical testing, using a stripped down piece of test code. Before testing this code in the final configuration, however, I figured that I should test it in a simple configuration and prove a few things. After all, it would be nice to know the code doesn't have bugs before I try it in an environment where I'm not particularly confident that it will work.
First, of course, I had to build a cluster. Nothing could be simpler, really, but it had been a few years since I've done it so I took two servers, one on 8.5.1 and one on 8.5.2, and just followed the instructions in the Domino Administrator help file. Half an hour later I had a cluster, with cluster replication doing it's thing, and failover occurring as expected on a Notes 8.5.2 client when I took either one of the servers down. Then I wrote a few lines of LotusScript and ran it as a client-side agent from the Actions menu. The code instantiated three NotesDatabase objects. It used the open() method on the first object to open a test database on one of the servers in the cluster. It used the open() method on the second object to open a replica of the same database on the other server. And it used the openWithFailover() method to open the same database again on the first server. In all three cases, the code called the NotesDatabase.isOpen() method to verify success, and when both servers were up the results were exactly as expected -- all three databases opened. But when the first server was down, the isOpen() test failed after both the first open() and the openWithFailover() call failed.
Just to be sure, I stripped down the code even further, so it just instantiates one single NotesDatabase object and uses the openWIthFailover() method, then tests isOpen(). Again, it works fine when the sever is up, but does not fail over to the second server in the cluster when the first server is down. I double and triple-checked everything but found no problems with either my cluster or my code. My client configuration is not an issue as far as I can see. I can connecto to both servers in the cluster, and the fact that both regular Open() calls do succeed proves that connectivity isn't a problem in the agent. So, a few days ago I opened a PMR with IBM and demonstrated the problem to them in a screen-sharing session. We checked a few things, like the cluster.ncf file, and then they asked me to send in my code. Today the support engineer informed me that he is getting the same results in his own test environment.
At this point, I just have to wonder... Has anyone actually observed OpenWithFailover() working?
1. Erik Brooks06/17/2011 08:14:51 PM
Never used it.... but could possibly in the exact type of scenario you're describing.
Just for kicks, I would ensure that you're either on 8.5.2FP2 or something like 8.5.1FP5. Non-fix-packed versions of 8.5.x have had some significant issues, it wouldn't surprise me if something related to openWithFailover() was fixed at some point.
Do let us know the results!
2. Fred Janssen06/21/2011 06:40:58 AM
Please check the cldbdir.nsf on both servers, do they both contain references to you applications on both servers?
ie: Your db should be listed in Server A's cldbdir.nsf (both replica's from server A and Server B should be in there.
Some goes for Server B's cldbdir.nsf (both replica's from server A and Server B should be in there.
The cldbdir.nsf should replicate beween Server a and B thus list all databases in the cluster.
Does the client still perform failover correctly?
3. Richard Schwartz06/21/2011 12:47:15 PM
Yes, the database is correctly listed in cldbdir.nsf. I checked the replicas on both servers, and each one contains both entries. I can see cluster replication working for various databases, and client failover works properly when I click on workspace icons to open databases.
4. Richard Schwartz06/21/2011 03:43:30 PM
Out of habit from Java, I escaped the "\" in the file path string constant that I passed into openWithFailover. In other words, I coded it as "mail\\user.nsf". My test code is LotusScript, not Java, so the escape is not necessary. The result is that I passed an extra backslash character into the actual API call.
This actually works fine for a NotesDatabase.open() call. The extra backslash is ignored.
It also works fine for a NotesDatabase.openWithFailover() call if the server named in the first argument is running. It just fails for NotesDatabase.openWithFailover() if the server is not running.
5. Mats07/24/2012 10:37:07 AM
I have found that replacing the actual server name with the Cluster name eliminates the timeout when Domino waits for the failing server
6. Mats07/24/2012 11:13:55 AM
Further to this it is a real shame that there is no openByReplicaIdWithFailover() as it is tedious to key in file paths and customers tend to change things...
I solved this by starting to open "names.nsf" with failover and grab the server name of the db that was returned, ie the server that was up and responding. Then I can use that server name to open a database by replica id in the following code. Not perfect but a step in the right direction