Weblogic JMS supports JDBC Store and Fle Store for persistence. Middleware uses JMS Distributed Queue backed by a JDBC store. This has led to some issues related to the corruption & crash of the Persistent store.
There were a few instances of JDBC Persistent store crashes, due to the inability to connect to the JDBC store DB server. Either the persistent store was corrupted (meaning the store's DB tables had to be purged & the JMS restarted to bring the JMS back in action) or the JDBC Persistent store just crashed (required a JMS server restart for normal functioning) . Many messages were lost as a result of these issues and the reliability was compromised.
- All messages in the store were lost since the table had to be purged inorder to recover.
- All incoming messages during the failure period were lost since the JMS was not in a state to receive incoming messages (in TRANSACTION_SEND state).
- The messages consumed by the listener (during the crash period) could not be committed or rolled back (in TRANSACTION_RECD state) were also lost after a restart. In this case, if the message was supposed to rollback, it was not.
- A restart was always required for the JMS server to funtion.
All this behavior were also simulated in the lab and confirmed.
In a nutshell, eventhough the JDBC store is more scalable - the reliability is compromised due to the Persistent store crashes. The database connectivity was not very reliable / high (due to DB unavailability & network connectivity). We used Oracle RAC (for high availability, and still the persistent store crashed - did not failover to the other RAC node). This could be a RAC related issue too. So, based on these behavioral aspects - JDBC store seems less reliable.
- Using a SAF agent with a JDBC persistent store: We were able to get this configuration working. This forces the SAF agent & the actual JMS to be on different servers. Also, SAF did not seem to be a right fit for this use case since the caller OSB and JMS are on the same Weblogic Domain/Server. This approach may still fail when a listener tries to commit or rollback during the JMS Persistent store crash period.
- Use a JMS File Store for the Distributed Queue: The File Store seems to be more Reliable based on the trends seen so far. Since the File store seems more reliable, the Issues & Impacts mentioned above seems alleviated / negligible. Scalability could be an issue compared to the JDBC store - however the current file system configuration should support the expected volume in the near future.
Weblogic documentation on File Store vs JDBC Store:
The following are some similarities and differences between file stores and JDBC stores:
- The default persistent store can only be a file store. Therefore, a JDBC store cannot be used as a default persistent store.
- The transaction log (TLOG) can only be stored in a default store.
- Both have the same transaction semantics and guarantees. As with JDBC store writes, file store writes are guaranteed to be persisted to disk and are not simply left in an intermediate (that is, unsafe) cache.
- Both have the same application interface (no difference in application code).
- All things being equal, file stores generally offer better throughput than a JDBC store. | Note: | If a database is running on high-end hardware with very fast disks, and WebLogic Server is running on slower hardware or with slower disks, then you may get better performance from the JDBC store. |
- File stores are generally easier to configure and administer, and do not require that WebLogic subsystems depend on any external component.
- File stores generate no network traffic; whereas, JDBC stores will generate network traffic if the database is on a different machine from WebLogic Server.
- JDBC stores may make it easier to handle failure recovery since the JDBC interface can access the database from any machine on the same network. With the file store, the disk must be shared or migrated.
The solution can vary based on the infrastructure and needs of the project. For our project File store was suggested, so that the reliability can be higher.
From a monitoring perspective, there should be a disk space alert if the file system disk space reaches 75% limit.