It is generally, though probably not universally, accepted that robust interfaces are a good thing. This was expressed in RFC-761 as "be conservative in what you do, be liberal in what you accept from others," which is frequently referred to as Postel's Law or The Robustness Principle. Robustness, however good it is in principle, is not always as easy to deal with in practice. This is especially true for people who are integrating systems built from components that have different ideas about how liberal they should be. I face this issue pretty regularly when dealing with messages that the Lotus Domino SMTP server has accepted from non-Domino systems and delivered to users. Domino can be very liberal about illegal formatting of inbound headers and MIME content, storing them as-is and making it the next guy's problem to figure out what to do with them. And while the Notes client often does manage to do something sensible with non-conformant data, some of industry-standard APIs that 3rd party developers use to parse such messages can be much less forgiving.
So... robustness is good, but inconsistent robustness across different systems... not so much. And inconsistent robustness within one API set? That's what I'm really here to talk about!
Let's look at two lines of code that each call the Domino back-end classes. Here's the first one:
Except for the semicolon at the end, it doesn't matter if this is Java or LotusScript. The sharp-eyed amongst you might notice something else that could matter, but in fact it doesn't. In both languages, if the db object is properly declared and instantiated, and the server "svr1/rhs" is running, reachable and accessible, and if the named database file exists and is accessible, this call works.
Now, let's make a small change to the code:
This code also still works, regardless of whether it is Java or LotusScript, and with the same stipulations I stated above. But this code is intended for a clustered environment, so what happens if we remove the stipulation that svr1/rhs is running, but add the stipulation that it is a member of a Domino cluster and another server in the cluster (call it "svr2/rhs") has a replica of the same database on it? That's where an inconsistency occurs, but in LotusScript only. The Java code works as expected, opening the database on the second server in the cluster, but the LotusScript code fails.
This happens because there's an error in the LotusScript syntax: the double backslash is really a double backslash, not an escaped single backslash the way it is in Java. When LotusScript passes the pathname with the double backslash to the Notes back-end classes, it passes them along to the Notes C API, which passes it on to the robust APIs for the filesystem on a Domino server running on Windows, which ignore the extra backslash. (I'm pretty sure this is also true on most of the other Domino platforms, if not all, but I haven't verified it.) That's why the first code example worked despite the error in syntax, and that's why openWithFailover works as long as the target server is up and running. But when the target server is down, openWithFailover tells the Notes C API to find out what cluster it is in and find another server in the cluster with a replica of the database, and it turns out that something in that process is not as robust. The extra backslash causes a lookup to fail, and no failover occurs.
This, by the way, is why I asked a few days ago whether the openWithFailover method actually works. The code I was testing it with was written in LotusScript, and it had double backslashes in the pathname. IBM support helped me figure this out. I feel a bit embarassed, but then again not.
I should mention, in case it's not clear, that I'm not saying there is an inconsistency between LotusScript and Java. There's a syntax difference, and that is certainly expected. The inconsistency I'm talking about is the handling of a double backslash in the open and openWithFailover methods. The Java code above never sends double backslashes because the Java language has already resolved them to a single backslash, so it's not an issue... but it is certainly possible to write Java code that really does send double backslashes. One probably wouldn't code it directly, as in "mail\\\\user.nsf", but it could easily look like this:
String folder = doc.getItemValueString("folder"); ... ... db.openWithFailover("svr1/rhs", folder + "\\" + file");
If the folder item that was read into the folder variable already had a backslash character at the and of its value, then this Java code would get the same inconsistent result as the LotusScript. It would work fine as long as svr1/rhs is up, just as it would work if it were just a call to db.open. But if svr1/rhs was down, this Java code would fail. This type of code, by the way, is the reason that filesystem APIs tend to be forgiving of extra slashes or backslashes.
There's no question that the double backslash in LotusScript is an error. It is, however, a pretty predictable error. It certainly is for me, given that three of the four languages that I deal with on a regular basis (Java, @formula, and C) all require escaping backslashes. Filesystem APIs generally consider this to be an innocent error, The Notes classes usually end up treating it as an innocent error, too, but not always.
I have to confess that, for the longest time, I had no idea that an ordinary NoteDatabase.Open() call is not cluster-aware. Clustering in Lotus Notes and Domino, after all, is magic, right? But the ordinary open(server,path) method in LotusScript and Java back-end classes does not fail over if the specified server is not responding There is a special openWithFailover(server,path) method, and that's what you're supposed to use if you want failover to occur. I can see the logic in this, as there are many cases where you really do have to carefully control what server you are working with.
So a few weeks ago I started thinking about upgrading some code to use openWithFailover, but this code runs in an unusual configuration (scheduled agent running on a server in domainA accessing databases on servers in domainB), so I decided to do some methodical testing, using a stripped down piece of test code. Before testing this code in the final configuration, however, I figured that I should test it in a simple configuration and prove a few things. After all, it would be nice to know the code doesn't have bugs before I try it in an environment where I'm not particularly confident that it will work.
First, of course, I had to build a cluster. Nothing could be simpler, really, but it had been a few years since I've done it so I took two servers, one on 8.5.1 and one on 8.5.2, and just followed the instructions in the Domino Administrator help file. Half an hour later I had a cluster, with cluster replication doing it's thing, and failover occurring as expected on a Notes 8.5.2 client when I took either one of the servers down. Then I wrote a few lines of LotusScript and ran it as a client-side agent from the Actions menu. The code instantiated three NotesDatabase objects. It used the open() method on the first object to open a test database on one of the servers in the cluster. It used the open() method on the second object to open a replica of the same database on the other server. And it used the openWithFailover() method to open the same database again on the first server. In all three cases, the code called the NotesDatabase.isOpen() method to verify success, and when both servers were up the results were exactly as expected -- all three databases opened. But when the first server was down, the isOpen() test failed after both the first open() and the openWithFailover() call failed.
Just to be sure, I stripped down the code even further, so it just instantiates one single NotesDatabase object and uses the openWIthFailover() method, then tests isOpen(). Again, it works fine when the sever is up, but does not fail over to the second server in the cluster when the first server is down. I double and triple-checked everything but found no problems with either my cluster or my code. My client configuration is not an issue as far as I can see. I can connecto to both servers in the cluster, and the fact that both regular Open() calls do succeed proves that connectivity isn't a problem in the agent. So, a few days ago I opened a PMR with IBM and demonstrated the problem to them in a screen-sharing session. We checked a few things, like the cluster.ncf file, and then they asked me to send in my code. Today the support engineer informed me that he is getting the same results in his own test environment.
At this point, I just have to wonder... Has anyone actually observed OpenWithFailover() working?