Page 12 of 13

Re: Site Freezing and Odd Behavior

Posted: Sat Oct 08, 2022 11:18 am
by escorpius
Kent Briggs wrote: Sat Oct 08, 2022 10:08 am I was expecting the next stop to be inside one of the GKO lines. But it stopped right in between two debug output lines, which makes no sense. I'm running out of things to try. Is there a consistent pattern as to when your crashes occur such as up time, time of day, tournaments played, etc?
I thought that output would be unsatisfying. One problem I've seen trying to debug with logging to files is that the OS is not guaranteed to flush the I/O on a crash, which can lead to misleading results. I usually try to avoid this issue by explicitly flushing the output on each log entry -- perhaps you're already doing this?

If this output is accurate and it is actually stopping between two lines, then it would indicate the issue is elsewhere and the PM_Timer freezing or dying is a side effect.

As for pattern, no, we've seen no clear pattern other than we've never seen it crash without players at the table. Number of players, time of day, specific events, specific players, etc. -- no clear pattern.

Re: Site Freezing and Odd Behavior

Posted: Sat Oct 08, 2022 12:19 pm
by BlackKnite69
Is there anyway for you and escorpius to look at things together to figure this out? We have nearly 650 members. People have all but quit playing. We struggle to get a single hold em game now. Soon it won't matter because we are a dead man walking if we do not get this fixed very, very soon. I am dealing with very upset players everyday and have no answers for them.

I am not sure if it has been brought up, but there is a potential symptom that pops up if we have players playing when this happens. It is very common for players to lose their connection and be booted from the table in the minutes before a crash. I spoke with 2 players, in our small game last night, the both thought the game was lagging. The each discovered they had timed out. Before they could sign back in they were removed from the table. Neither were sitting out more than 3 minutes. One player had no other wifi activity going on. The other was on screen with same computer streaming a tv show. He had no lag or interruptions to his streaming. Each player experienced this 3-4 times.

It is common to see an increase in players timing out in the time before a crash.

Re: Site Freezing and Odd Behavior

Posted: Sat Oct 08, 2022 12:29 pm
by Kent Briggs
escorpius wrote: Sat Oct 08, 2022 11:18 am One problem I've seen trying to debug with logging to files is that the OS is not guaranteed to flush the I/O on a crash, which can lead to misleading results. I usually try to avoid this issue by explicitly flushing the output on each log entry -- perhaps you're already doing this?
The error/event log files have their own thread which queues up and dumps every second. Otherwise the file system could not keep up processing single lines, especially on spinning disk drives. But it's only a 1 second queue and your event log seems to be recording for another full minute. In fact, that seems to be a pattern itself.
If this output is accurate and it is actually stopping between two lines, then it would indicate the issue is elsewhere and the PM_Timer freezing or dying is a side effect.
I'm thinking that too except that it's always happening in the same spot. I occasionally hear about crashes from other sites but nothing like what you are experiencing. My own demo site (admittedly low traffic) has been online for 41 days, since the last update.

Re: Site Freezing and Odd Behavior

Posted: Sat Oct 08, 2022 12:41 pm
by Kent Briggs
BlackKnite69 wrote: Sat Oct 08, 2022 12:19 pm Is there anyway for you and escorpius to look at things together to figure this out?
That's what we've been doing.
We have nearly 650 members. People have all but quit playing. We struggle to get a single hold em game now. Soon it won't matter because we are a dead man walking if we do not get this fixed very, very soon. I am dealing with very upset players everyday and have no answers for them.
This is your first post since late August so I had no idea. You can't assume your site and his have related issues.
It is common to see an increase in players timing out in the time before a crash.
That could be from multiple things like too many connections or loss of internet connectivity.

Re: Site Freezing and Odd Behavior

Posted: Sat Oct 08, 2022 1:52 pm
by escorpius
Kent Briggs wrote: Sat Oct 08, 2022 12:29 pm The error/event log files have their own thread which queues up and dumps every second. Otherwise the file system could not keep up processing single lines, especially on spinning disk drives. But it's only a 1 second queue and your event log seems to be recording for another full minute. In fact, that seems to be a pattern itself.
Understandable. So, it remains possible that we are not seeing the actual crash point. As for the "pattern", we have a watchdog service that will restart the server if it does not see a "Traffic" line within a threshold past the one minute mark all to avoid significant downtime.

FYI, BlackKnite69 is the site manager for the site to which I provide operational support. FWIW, I have not seen timeouts prevalent in the logs prior to crashes -- as I think can be confirmed in the snippets I've provided.
I'm thinking that too except that it's always happening in the same spot. I occasionally hear about crashes from other sites but nothing like what you are experiencing. My own demo site (admittedly low traffic) has been online for 41 days, since the last update.
Are you aware of any that are using the APIs extensively? It also remains very odd to me that we saw none of this behavior with v6. We made no changes to our backend or frontend services when migrating to v7, but saw the crashes begin immediately thereafter. Not likely a coincidence.

Re: Site Freezing and Odd Behavior

Posted: Sat Oct 08, 2022 2:04 pm
by Kent Briggs
escorpius wrote: Sat Oct 08, 2022 1:52 pm FYI, BlackKnite69 is the site manager for the site to which I provide operational support.
Ok, that wasn't made clear to me. I thought these were two different sites.
Are you aware of any that are using the APIs extensively?
I think one was and is running the first 7.10 beta but hasn't reported back to me about any crashes since.

I forgot to ask: are you still seeing duplicate logins in the client?

Re: Site Freezing and Odd Behavior

Posted: Sat Oct 08, 2022 2:12 pm
by Kent Briggs
I have an idea for a new beta so you can disable the KillInactiveSessions function completely at will. I'll be in touch when it's ready.

Re: Site Freezing and Odd Behavior

Posted: Mon Oct 10, 2022 4:18 pm
by Kent Briggs
Kent Briggs wrote: Sat Oct 08, 2022 2:12 pm I have an idea for a new beta so you can disable the KillInactiveSessions function completely at will.
Actually I didn't do that but might have found the real issue (fingers crossed), related to critical sections. Beta 5 link sent to your email.

Re: Site Freezing and Odd Behavior

Posted: Fri Oct 21, 2022 2:47 am
by escorpius
So, mixed news. The changes have definitely hit something of the problem as we have crashed only two times in 8 days and counting. We crashed on the first day of running the latest Beta. Then, we went almost a week with no crash before finally crashing on Wednesday night. Since then, we have been up with no crash, 1.25 days and counting.

This is a notable improvement. Unfortunately, in addition to the less frequent crashes, we are now also seeing a relatively high volume of players getting disconnected, dropped from hands, etc. The Hand History will simply say they "timed out" and the EventLog shows what appears to be disconnect / logout sequences followed by new connections / logins when the player comes back to the table.

For example, player "ts" timed out during a hand started at at 22:48 and the following are the entries from the EventLog:

Code: Select all

2022-10-20 22:47:39|System|Traffic - Seconds: 59, Bytes in: 6297, Bytes out: 312548, Total: 318845, Threads: 94, CPU: 0.0%, Memory: 35224 kb
2022-10-20 22:47:41|Logout|R logged out session 000000DB, PC 0693CBDF
2022-10-20 22:47:41|Login|R logged into session 000000DD from IP ***.***.146.3, PC 0693CBDF
2022-10-20 22:47:50|API|AccountsGet from ***.***.63.28
2022-10-20 22:47:51|API|AccountsGet from ***.***.63.28
2022-10-20 22:48:27|House|Ring -1 balance 1290.33 (Rake Black AA Cracked 1/1 Jackpot Table)
2022-10-20 22:48:27|House|Rake +1 balance 261558.88 (Black AA Cracked 1/1 Jackpot Table)
2022-10-20 22:48:38|System|Traffic - Seconds: 59, Bytes in: 2249, Bytes out: 247407, Total: 249656, Threads: 92, CPU: 0.0%, Memory: 35192 kb
2022-10-20 22:49:37|System|Traffic - Seconds: 59, Bytes in: 1733, Bytes out: 152253, Total: 153986, Threads: 92, CPU: 0.0%, Memory: 35192 kb
2022-10-20 22:49:51|House|Ring -0.50 balance 1289.83 (Rake Black AA Cracked 1/1 Jackpot Table)
2022-10-20 22:49:51|House|Rake +0.50 balance 261559.38 (Black AA Cracked 1/1 Jackpot Table)
2022-10-20 22:49:58|Connect|ts disconnects session 000000CC, PC B98BC0DE
2022-10-20 22:50:17|Connect|R disconnects session 000000DD, PC 0693CBDF
2022-10-20 22:50:36|System|Traffic - Seconds: 59, Bytes in: 1923, Bytes out: 205567, Total: 207490, Threads: 89, CPU: 0.0%, Memory: 35136 kb
2022-10-20 22:50:49|API|AccountsSessionKey from ***.***.252.115 (ts)
2022-10-20 22:50:51|Connect|Connection 000000DE accepted from IP ***.***.205.29, PC B98BC0DE
2022-10-20 22:50:53|Logout|ts logged out session 000000CC, PC B98BC0DE
2022-10-20 22:50:53|Login|ts logged into session 000000DE from IP ***.***.205.29, PC B98BC0DE
2022-10-20 22:51:06|API|AccountsGet from ***.***.63.28
2022-10-20 22:51:22|House|Ring -1.70 balance 1288.13 (Rake Black AA Cracked 1/1 Jackpot Table)
2022-10-20 22:51:22|House|Rake +1.70 balance 261561.08 (Black AA Cracked 1/1 Jackpot Table)
2022-10-20 22:51:36|System|Traffic - Seconds: 59, Bytes in: 4011, Bytes out: 246749, Total: 250760, Threads: 90, CPU: 0.0%, Memory: 35156 kb
The crashes remain a priority, but the sudden jump in disconnects is now a prickly issue for the players. What might we check / do on our side to mitigate this new issue?

Also, is there any way to truly tell what happened here, e.g., in this specific instance? Meaning, can we distinguish from the logging whether the server is taking action on the connection; whether this is a purposeful client-side disconnect; or, whether this is an unintentional client-side disconnect, e.g., a network issue?

FWIW, We will provide details from any future crash here as well. For now, we are "burning in" the beta a bit to see if there is a new pattern to the crashes, e.g., every 6 days.

Re: Site Freezing and Odd Behavior

Posted: Fri Oct 21, 2022 9:47 am
by Kent Briggs
escorpius wrote: Fri Oct 21, 2022 2:47 am So, mixed news. The changes have definitely hit something of the problem as we have crashed only two times in 8 days and counting. We crashed on the first day of running the latest Beta. Then, we went almost a week with no crash before finally crashing on Wednesday night. Since then, we have been up with no crash, 1.25 days and counting.
What's in the Error Log during these crashes? Do you have the KillInactiveSessions markers recording and if so are any crashes preventing the End marker from recording.
This is a notable improvement. Unfortunately, in addition to the less frequent crashes, we are now also seeing a relatively high volume of players getting disconnected, dropped from hands, etc.
DIsconnects are generated way up the chain at the O/S level. The software just reports what it sees (unless it's manually killing the connection itself).
Meaning, can we distinguish from the logging whether the server is taking action on the connection; whether this is a purposeful client-side disconnect; or, whether this is an unintentional client-side disconnect, e.g., a network issue?
There's no way for the poker software to distinguish the cause.