Quantcast
Channel: VMware Communities : All Content - vFabric GemFire [ARCHIVED]
Viewing all articles
Browse latest Browse all 990

Locator is forced out of the distributed system by a member

$
0
0

Hi, I detect a major probleminmy application, for a unknown reason the locator log show messages of "Suspect notification for member" , and after a while the locator is forced out of the distributed system, this is a major problem because without the locator i loose the connectivity with the server.


The architectue of my application is a lot of clients that are connected with a server with two nodes, for each node I use a locator.


In the logs you can see the ip = 172.29.0.179 ( is the ip of a node)  and 172.29.0.178 is the ip of the other node, and this logs show the locator that is starting in 172.29.0.178


The lines of the log are this:


[info 2013/12/13 16:55:03.907 ART  <main> tid=0x1] Starting distributed system

 

[info 2013/12/13 16:55:04.227 ART  <main> tid=0x1] GemFire P2P Listener started on  tcp:///172.29.0.178:59357

 

[info 2013/12/13 16:55:04.344 ART  <main> tid=0x1] Attempting to join distributed system whose membership coordinator is 172.29.0.179(31524)<v75>:34667 using membership ID datac-r35-07(22965):56765

 

[info 2013/12/13 16:55:06.084 ART  <main> tid=0x1] Entered into membership in group GF66 with ID datac-r35-07(22965:admin)<v86>:56765/59357.

 

[info 2013/12/13 16:55:06.085 ART  <main> tid=0x1] Starting DistributionManager datac-r35-07(22965:admin)<v86>:56765/59357.

 

[info 2013/12/13 16:55:06.086 ART  <main> tid=0x1] Initial (membershipManager) view =  [172.29.0.179(31524:admin)<v75>:34667/44384, 172.29.0.179(31623:admin)<v76>:49692/36785, 172.29.0.179(31733)<v77>:38302/46653, datac-r35-07(22965:admin)<v86>:56765/59357]

 

[info 2013/12/13 16:55:06.086 ART  <main> tid=0x1] DMMembership: Admitting new administration member < 172.29.0.179(31524:admin)<v75>:34667/44384 >.

 

[info 2013/12/13 16:55:06.086 ART  <main> tid=0x1] DMMembership: Admitting new administration member < 172.29.0.179(31623:admin)<v76>:49692/36785 >.

 

[info 2013/12/13 16:55:06.086 ART  <main> tid=0x1] Admitting member <172.29.0.179(31733)<v77>:38302/46653>. Now there are 3 non-admin member(s).

 

[info 2013/12/13 16:55:06.087 ART  <main> tid=0x1] DMMembership: Admitting new administration member < datac-r35-07(22965:admin)<v86>:56765/59357 >.

 

[info 2013/12/13 16:55:06.154 ART  <main> tid=0x1] DistributionManager datac-r35-07(22965:admin)<v86>:56765/59357 started on datac-r35-07.gire.com[55421],datac-r35-08.gire.com[55421]. There were 1 other DMs. others: [172.29.0.179(31733)<v77>:38302/46653]   (admin only)

 

[info 2013/12/13 16:55:06.163 ART  <main> tid=0x1] Locator started on  172.29.0.178[55421]

 

[info 2013/12/13 16:55:06.163 ART  <main> tid=0x1] Starting server location for Distribution Locator on datac-r35-07[55421]

 

[info 2013/12/13 16:55:12.399 ART  <UDP ucast receiver> tid=0x1d] Membership: received new view  [172.29.0.179(31524)<v75>:34667|87] [172.29.0.179(31524)<v75>:34667/44384, 172.29.0.179(31623)<v76>:49692/36785, 172.29.0.179(31733)<v77>:38302/46653, datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491]

 

[info 2013/12/13 16:55:12.419 ART  <View Message Processor> tid=0x31] DMMembership: Admitting new administration member < datac-r35-07(23067:admin)<v87>:41624/44491 >.

 

[info 2013/12/13 16:55:19.526 ART  <UDP ucast receiver> tid=0x1d] Membership: received new view  [172.29.0.179(31524)<v75>:34667|88] [172.29.0.179(31524)<v75>:34667/44384, 172.29.0.179(31623)<v76>:49692/36785, 172.29.0.179(31733)<v77>:38302/46653, datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097]

 

[info 2013/12/13 16:55:19.539 ART  <View Message Processor> tid=0x31] Admitting member <datac-r35-07(23184)<v88>:13261/47097>. Now there are 6 non-admin member(s).

 

[info 2013/12/13 16:59:21.292 ART  <P2P message reader for 172.29.0.179(31733)<v77>:38302/46653 SHARED=true ORDERED=true UID=18> tid=0x3c] Member at 172.29.0.179(31733)<v77>:38302/46653 gracefully left the distributed cache: shutdown message received

 

[info 2013/12/13 16:59:21.542 ART  <UDP ucast receiver> tid=0x1d] Received Suspect notification for member(s) [172.29.0.179(31733)<v77>:38302] from 172.29.0.179(31623)<v76>:49692.

 

[info 2013/12/13 16:59:21.568 ART  <UDP ucast receiver> tid=0x1d] Membership: received new view  [172.29.0.179(31524)<v75>:34667|89] [172.29.0.179(31524)<v75>:34667/44384, 172.29.0.179(31623)<v76>:49692/36785, datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097]

 

[info 2013/12/13 16:59:25.309 ART  <UDP ucast receiver> tid=0x1d] Received Suspect notification for member(s) [172.29.0.179(31623)<v76>:49692] from 172.29.0.179(31524)<v75>:34667.

 

[info 2013/12/13 16:59:25.325 ART  <UDP ucast receiver> tid=0x1d] Membership: received new view  [172.29.0.179(31524)<v75>:34667|90] [172.29.0.179(31524)<v75>:34667/44384, datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097]

 

[info 2013/12/13 16:59:26.542 ART  <VERIFY_SUSPECT.TimerThread> tid=0x3e] No suspect verification response received from 172.29.0.179(31733)<v77>:38302 in 5000 milliseconds: I believe it is dead.

 

[info 2013/12/13 16:59:28.590 ART  <UDP ucast receiver> tid=0x1d] Membership: received new view  [172.29.0.179(31524)<v75>:34667|91] [datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097]

 

[info 2013/12/13 16:59:31.543 ART  <VERIFY_SUSPECT.TimerThread> tid=0x3e] No suspect verification response received from 172.29.0.179(31623)<v76>:49692 in 6234 milliseconds: I believe it is dead.

 

[info 2013/12/13 16:59:37.778 ART  <VERIFY_SUSPECT.TimerThread> tid=0x3e] No suspect verification response received from 172.29.0.179(31524)<v75>:34667 in 9156 milliseconds: I believe it is dead.

 

[info 2013/12/13 16:59:49.949 ART  <Timer-4> tid=0x1a] Could not connect to distribution locator  datac-r35-08<v0>:55421: java.net.ConnectException: Connection refused

 

[info 2013/12/13 17:00:47.071 ART  <Timer-4> tid=0x1a] Could not connect to distribution locator  datac-r35-08<v0>:55421: java.net.ConnectException: Connection refused

 

[info 2013/12/13 17:01:44.195 ART  <Timer-4> tid=0x1a] Could not connect to distribution locator  datac-r35-08<v0>:55421: java.net.ConnectException: Connection refused

 

[info 2013/12/13 17:02:41.317 ART  <Timer-4> tid=0x1a] Could not connect to distribution locator  datac-r35-08<v0>:55421: java.net.ConnectException: Connection refused

 

[info 2013/12/13 17:03:38.440 ART  <Timer-4> tid=0x1a] Could not connect to distribution locator  datac-r35-08<v0>:55421: java.net.ConnectException: Connection refused

 

[info 2013/12/13 17:04:02.356 ART  <ViewHandler> tid=0x4f] Membership: sending new view [[datac-r35-07(22965)<v86>:56765|92] [datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097, 172.29.0.179(27638)<v92>:39650/55857]] (4 mbrs)

 

 

[info 2013/12/13 17:04:02.370 ART  <UDP Incoming Message Handler> tid=0x1c] Membership: received new view  [datac-r35-07(22965)<v86>:56765|92] [datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097, 172.29.0.179(27638)<v92>:39650/55857]

 

[info 2013/12/13 17:04:02.391 ART  <View Message Processor> tid=0x31] DMMembership: Admitting new administration member < 172.29.0.179(27638:admin)<v92>:39650/55857 >.

 

[info 2013/12/13 17:04:08.348 ART  <ViewHandler> tid=0x4f] Membership: sending new view [[datac-r35-07(22965)<v86>:56765|93] [datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097, 172.29.0.179(27638)<v92>:39650/55857, 172.29.0.179(27739)<v93>:1619/43425]] (5 mbrs)

 

 

[info 2013/12/13 17:04:08.362 ART  <UDP Incoming Message Handler> tid=0x1c] Membership: received new view  [datac-r35-07(22965)<v86>:56765|93] [datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097, 172.29.0.179(27638)<v92>:39650/55857, 172.29.0.179(27739)<v93>:1619/43425]

 

[info 2013/12/13 17:04:08.381 ART  <View Message Processor> tid=0x31] DMMembership: Admitting new administration member < 172.29.0.179(27739:admin)<v93>:1619/43425 >.

 

[info 2013/12/13 17:04:15.542 ART  <ViewHandler> tid=0x4f] Membership: sending new view [[datac-r35-07(22965)<v86>:56765|94] [datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097, 172.29.0.179(27638)<v92>:39650/55857, 172.29.0.179(27739)<v93>:1619/43425, 172.29.0.179(27857)<v94>:36916/47253]] (6 mbrs)

 

 

[info 2013/12/13 17:04:15.558 ART  <UDP Incoming Message Handler> tid=0x1c] Membership: received new view  [datac-r35-07(22965)<v86>:56765|94] [datac-r35-07(22965)<v86>:56765/59357, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097, 172.29.0.179(27638)<v92>:39650/55857, 172.29.0.179(27739)<v93>:1619/43425, 172.29.0.179(27857)<v94>:36916/47253]

 

[info 2013/12/13 17:04:15.577 ART  <View Message Processor> tid=0x31] Admitting member <172.29.0.179(27857)<v94>:36916/47253>. Now there are 6 non-admin member(s).

 

[info 2013/12/14 02:23:53.283 ART  <Timer-4> tid=0x1a] Could not connect to distribution locator  datac-r35-07<v0>:55421: java.net.SocketException: Socket closed

 

[info 2013/12/14 09:08:11.416 ART  <UDP ucast receiver> tid=0x1d] Received Suspect notification for member(s) [datac-r35-07(22965)<v86>:56765] from 172.29.0.179(27857)<v94>:36916.

 

[info 2013/12/14 09:08:11.598 ART  <UDP ucast receiver> tid=0x1d] Membership: received new view  [172.29.0.179(27638)<v92>:39650|105] [172.29.0.179(27638)<v92>:39650/55857, datac-r35-07(23067)<v87>:41624/44491, datac-r35-07(23184)<v88>:13261/47097, 172.29.0.179(27739)<v93>:1619/43425, 172.29.0.179(27857)<v94>:36916/47253] crashed mbrs: [datac-r35-07(22965)<v86>:56765/59357]

 

[severe 2013/12/14 09:08:11.608 ART  <CloserThread> tid=0x12c] Membership service failure: Channel closed: com.gemstone.gemfire.ForcedDisconnectException: This member has been forced out of the distributed system by 172.29.0.179(27638)<v92>:39650.  Please consult GemFire logs to find the reason. (GMS shun)

 

[info 2013/12/14 09:08:11.609 ART  <CloserThread> tid=0x12c] Stopping Distribution Locator on datac-r35-07[55421]

 

[info 2013/12/14 09:08:11.622 ART  <CloserThread> tid=0x12c] Disconnecting distributed system for Distribution Locator on datac-r35-07[55421]

 

[info 2013/12/14 09:08:11.623 ART  <CloserThread> tid=0x12c] Shutting down DistributionManager datac-r35-07(22965:admin)<v86>:56765/59357.

 

[info 2013/12/14 09:08:11.623 ART <main> tid=0x1] Locator stopped

 

What's mean "Received Suspect notification" ?

Why did happen?

 

 

Thanks,

 

Juan


Viewing all articles
Browse latest Browse all 990

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>