Quantcast
Channel: Ignite Realtime : All Content - Openfire Plugins
Viewing all articles
Browse latest Browse all 488

Openfire 4.1.0 alpha Hazelcast 2.2 HAProxy Docker

$
0
0

Setup: Two node Openfire cluster using Hazelcast with HAProxy load balancer and each component is running in its own Docker container

- openfire1 in Docker container

- openfire2 in Docker container

- haproxy in Docker container in front of two node cluster using TCP mode

- Hazelcast plugin used to create cluster between openfire1 and openfire2

 

My web client is able to connect to both Openfire instances and create client sessions fine.  I can view them in Admin Consoles of both Openfire nodes.  Clustering is enabled and I can see some client sessions are local whereas others are remote depending on which Openfire node the client connects to.

 

Problem: For testing, I manually take down openfire1 in the cluster and notice that the client that is locally connected to that Openfire immediately loses their connection and then attempts to reconnect.  I would expect that Hazelcast would handle notifying the other Openfire node that it must take over the client's session.  That seems to happen, but not before a very long delay occurs.  During this delay, I cannot view Client Sessions in the remaining Openfire node using Admin Console.  Nor will any new client sessions be able to be created, connected to the remaining node.

 

Question:  Anyone know where this problem might be coming from ... Hazelcast, Openfire, HAProxy, Docker??  It would seem like during the loss of the first Openfire node, openfire1, the second node, openfire2, seems stuck trying to process, or figure out that it is now the sole node remaining, and is trying to process the transition?  This is my theory since when navigating to the surviving node's Admin Console, I cannot view Server or Client Sessions, however I can view Plugins tab, and Group Chat tab.  After some long time, say more than a minute, the second Openfire node, openfire2, then reconnects the client sessions, and is able to accept new connections as well.  Maybe there is a configuration setting which decreases this long delay of failover?

 

Additional info:

 

It takes approximately 1 minute 14 seconds for the remaining node to pick up all client sessions and allow them to reconnect.

 

In the log file, as soon as openfire1 node is brought down, the following messages appear, and they appear over and over until the cluster removes the node that was taken down.  Seems like once the node that was taken down is officially removed, then the remaining up node is able to recover client sessions.

 

2016.06.09 20:11:14 INFO  [hz.openfire.IO.thread-in-0]: com.hazelcast.nio.tcp.TcpIpConnection - [172.17.0.5]:5701 [openfire] [3.5.1] Connection [Address[172.17.0.8]:5701] lost. Reason: java.io.EOFException[Remote socket closed!]

2016.06.09 20:11:14 WARN  [hz.openfire.IO.thread-in-0]: com.hazelcast.nio.tcp.ReadHandler - [172.17.0.5]:5701 [openfire] [3.5.1] hz.openfire.IO.thread-in-0 Closing socket to endpoint Address[172.17.0.8]:5701, Cause:java.io.EOFException: Remote socket closed!

2016.06.09 20:11:16 INFO  [cached3]: com.hazelcast.nio.tcp.SocketConnector - [172.17.0.5]:5701 [openfire] [3.5.1] Connecting to /172.17.0.8:5701, timeout: 0, bind-any: true

2016.06.09 20:11:20 WARN  [cached1]: com.hazelcast.cluster.ClusterService - [172.17.0.5]:5701 [openfire] [3.5.1] This node does not have a connection to Member [172.17.0.8]:5701

2016.06.09 20:11:25 WARN  [cached2]: com.hazelcast.cluster.ClusterService - [172.17.0.5]:5701 [openfire] [3.5.1] This node does not have a connection to Member [172.17.0.8]:5701

2016.06.09 20:11:28 INFO  [hz.openfire.InspectInvocationsThread]: com.hazelcast.spi.OperationService - [172.17.0.5]:5701 [openfire] [3.5.1] Handled 0 invocation timeouts and 1 backupTimeouts

2016.06.09 20:11:30 WARN  [cached1]: com.hazelcast.cluster.ClusterService - [172.17.0.5]:5701 [openfire] [3.5.1] This node does not have a connection to Member [172.17.0.8]:5701

2016.06.09 20:11:35 WARN  [cached1]: com.hazelcast.cluster.ClusterService - [172.17.0.5]:5701 [openfire] [3.5.1] This node does not have a connection to Member [172.17.0.8]:5701

2016.06.09 20:11:40 WARN  [cached4]: com.hazelcast.cluster.ClusterService - [172.17.0.5]:5701 [openfire] [3.5.1] This node does not have a connection to Member [172.17.0.8]:5701

2016.06.09 20:11:44 INFO  [hz.openfire.InspectInvocationsThread]: com.hazelcast.spi.OperationService - [172.17.0.5]:5701 [openfire] [3.5.1] Handled 0 invocation timeouts and 1 backupTimeouts

2016.06.09 20:11:45 WARN  [cached1]: com.hazelcast.cluster.ClusterService - [172.17.0.5]:5701 [openfire] [3.5.1] This node does not have a connection to Member [172.17.0.8]:5701

2016.06.09 20:11:50 WARN  [cached1]: com.hazelcast.cluster.ClusterService - [172.17.0.5]:5701 [openfire] [3.5.1] This node does not have a connection to Member [172.17.0.8]:5701

2016.06.09 20:11:55 WARN  [cached1]: com.hazelcast.cluster.ClusterService - [172.17.0.5]:5701 [openfire] [3.5.1] This node does not have a connection to Member [172.17.0.8]:5701

2016.06.09 20:11:57 ERROR [socket_c2s-thread-2]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Failed to execute cluster task within 30 seconds

java.util.concurrent.TimeoutException: Call Invocation{ serviceName='hz:impl:executorService', op=com.hazelcast.executor.impl.operations.MemberCallableTaskOperation{serviceNa me='null', partitionId=-1, callId=0, invocationTime=1465503117330, waitTimeout=-1, callTimeout=30000}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=64, callTimeout=30000, target=Address[172.17.0.8]:5701, backupsExpected=0, backupsCompleted=0} encountered a timeout

        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicatio nResponse(InvocationFuture.java:366)

        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicatio nResponseOrThrowException(InvocationFuture.java:334)

        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFut ure.java:225)

        at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:71)

        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doSynchronous ClusterTask(ClusteredCacheFactory.java:374)

        at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory. java:596)

        at org.jivesoftware.openfire.plugin.session.RemoteSession.doSynchronousClusterTask (RemoteSession.java:193)

        at org.jivesoftware.openfire.plugin.session.RemoteClientSession.incrementConflictC ount(RemoteClientSession.java:157)


Viewing all articles
Browse latest Browse all 488

Trending Articles