Hand History Parser for obtaining Player Stats

Add your suggestions for improving Poker Mavens
eightospade
Posts: 4
Joined: Mon Apr 20, 2020 3:24 pm

Hand History Parser for obtaining Player Stats

Post by eightospade »

Hi Kent and Team,

Is there an existing script or utility that parses hand histories?

I wanted to know if there exists a parsing tool for the hand history text in order to obtain ANY kind of data. I want to be able to obtain stats about myself and opponents, I'm talking HUD stats like VPIP, 3b, RFI, etc.

Any kind of script or program that interacts with the hand history files is of interest to me and could be a handy starting point.

Thanks,
8s
Kent Briggs
Site Admin
Posts: 5878
Joined: Wed Mar 19, 2008 8:47 pm

Re: Hand History Parser for obtaining Player Stats

Post by Kent Briggs »

None that I know of.
StevieG
Posts: 56
Joined: Mon May 04, 2020 12:27 pm

Re: Hand History Parser for obtaining Player Stats

Post by StevieG »

Here is a Python script that I wrote for parsing logs and coming up with amount bought in (including add-ons) for and what players left with.

This as built for ring games only, but can track activity across multiple tables.

There are loops for doing things like emailing players their activity and writing CSV contents to files. That is all about tracking session numbers, and you will want to ditch that most likely.

It does not attempt to capture play stats at the moment, but the code around line 241 is where it runs through hand by hand and looks for player actions. That is where you would want to extend it for stats.

Code: Select all

#!/usr/bin/python
# processLog.py
# Steve Grantz [email protected]
# 2020-04-26
# Usage:
# python processLog.py logfile.txt [logfile2.txt ...]

############################################################################################################
# WHAT THIS DOES
#
# goal of this program is to process Poker Mavens logs to track player activity
# - initial appearance (initial buy-in)
# - addition of chips
# - last known amount of chips
#
# To do this we will take the logs and first break it up into hand by hand, indexed by time
# for a chronology
#
# Then we can loop through each hand, and process for player activity
# the first time we see a player, we add them and their first known chip count
# then in each hand we not additional chips, as well as resolution of the pot
#
# When processing the next hand, everything SHOULD align
# if it does not, throw an error
# otherwise keep processing
#
# look for wins, add ons, and pot contributions
# log cash in and cash out for narrative at end of night
#
# KEY ASSUMPTIONS
#
# assume unique hand number (that is the hand number can NOT repeat across tables)
# assume that the hand nummber structure NNN-M can be reduced to NNN andt hat the M local part is not needed
#
#
# CHANGE LOG
# 2020-04-26 v0.1 first version
# 2020-04-28 v0.2 email results
#

import argparse
import csv
import datetime
import getpass
import os
import re
import sys


from os import path
from smtplib import SMTP

# constants
VERSION = "0.2"
CSVTRANS = "gamelog.csv"
CSVBALANCE = "balances.csv"
LOCAL = "local"
INDEX = "imdex"
TEXT = "text"
FIRST = "first"
LATEST = "latest"
LAST = "last"
IN = "cash in"
OUT = "cash out"
WAITING = "sitting out"
LEFT = "left table"
NOTES = "notes"
TABLE="table"
COUNT="count"
DATETIME="datetime"
NAME="name"
UNIT="unit"
RUNNERS="runners"
REBUYS="rebuys"
EMAIL="email"
WINNERS="winnerShares"

# constants around email options
EMAIL_SUBJ_PREFIX = "Game info from "
FROMADDRESS = '[email protected]'
CCADDRESS = '[email protected]'
SMTPSERVER = ''
SMTPPORT = 26
DEBUGLEVEL = 0


##################################################################################################################
#
# DATA STRUCTURES
#
hands = {}    # the hands dictionary
              # structure
              # KEY - string - hand number
              # LOCAL - string - he "dash" portion of the hand number, may recombine, but so far unique without it
              # DATETIME - datetime - timestamp for the hand
              # TABLE - string - table where the hand happened
              # TEXT  - string - full text of hand, with newlines

players = {}  # the players dictionary
              # structure
              # KEY - string - player name as found in log
              # IN - float - total money in
              # OUT - float - total money out
              # NOTES - string log of activity with newlines
              # sub-dictionary by TABLE ******
              #      KEY - string for the table - will only exist if player was seen at table in logs
              #      FIRST - float - initial buy in for table - not really used much, could be deprecated
              #      IN - float - money in at this table
              #      OUT - float - money out at this table
              #      WAITING - Boolean - whether player is seated ut not in play
              #      LEFT - Boolean - player has been at table but is no longer seated
              #      LATEST - float - running tally of player holding at the table - IMPORTANT for checking consistency

tables = {}   # the tables dictionary
              # structure
              # KEY - string - table name as found in log
              # COUNT - integer - number of hands processed for table
              # LATEST - datetime - the latest time stamp for a hand processed for this table
              # LAST - string - hand number for the latest hand processed for this table
              #        LAST and LATEST are used to mark the "end" activity of players standing up
              #        they represent the last seen hand at the table from the processed logs

csvRows = []  # list of lists for the csv transaction content
              # see CSV Header for list of fields
              # CSV - log of activity in CS format - with newlines
csvHeader =  ["Time",
               "Table",
               "Hand Number",
               "Player",
               "Action",
               "Amount In",
               "Amount Out"
               ]

csvBalances = []  # list of lists for the csv balance content
csvBalanceHeader =  ["Date",
                     "Disposition",
                     "Player",
                     "Amount"
                     ]

# resolvedScreenNames dictionary by Screen Name, has info needed for processing
#               Structure
#               KEY - screen name
#               NAME - short name used in player ledger
#               EMAIL - email address for the player for sending player notes for session
resolvedScreenNames = {
               }


# end of data structures
#
#######################################################################################################################




lineCount = 0
sessionDate = datetime.datetime.now().strftime("%m/%d/%Y")

# get and parse command line arguments
# then process some key ones straight away
# namely, if roster option is used, dump the player roster and go
# if email option is activated, check for presence of password command line argument
# if not there prompt for it
parser = argparse.ArgumentParser(description='Process Poker Maven log files and deliver transaction info and player balances.')
parser.add_argument('-c','--csv', action="store_true",dest="doCsv",default=False,help="Output CSV content.")
parser.add_argument('-e','--email', action="store_true",dest="doEmail",default=False,help="Email player results.")
parser.add_argument('-p','--password', action="store",dest="password",
                    help=("Password for email account (" + FROMADDRESS + ")"))
parser.add_argument('-q','--quiet', action="store_true",dest="quiet",default=False,help="Run in quiet mode with minimal output.")
parser.add_argument('-r','--roster', action="store_true",dest="roster",default=False,
                    help="Show roster of players known to the script and exit.")
parser.add_argument('file', type=argparse.FileType('r'), nargs='*',help="plain text files of Poker Mavens hand histories to process.")
args = parser.parse_args()

if (args.roster):
    if (args.doCsv):
        print(" Screen Name,Nickname,EMail")
    else:
        print("Roster of Players: " + str(len(resolvedScreenNames)))
        print("")
    for player in sorted(resolvedScreenNames.keys(), key=str.casefold):
        if (args.doCsv):
            text = (player + "," + resolvedScreenNames[player][NAME] + ",")
            if (EMAIL in resolvedScreenNames[player]):
                text = text + resolvedScreenNames[player][EMAIL]
        else:
            text = (player + " (" + resolvedScreenNames[player][NAME] + ")")
            if (EMAIL in resolvedScreenNames[player]):
                text = text + " - " + resolvedScreenNames[player][EMAIL]
        print (text)
    sys.exit(0)

emailPassword = ''
if(args.doEmail):
    if (args.password is None):
        emailPassword = getpass.getpass("Enter the password for the enail account (" + FROMADDRESS +"): ")
    else:
        emailPassword = args.password



lastHandTime = datetime.datetime.now()

numArg = len(args.file)
if (numArg == 0):
    print("Must provide a name of a log file to process.")
else:
    # process each file listed on the command line
    # first loop through is just to parse and get each hand separated, and get basic hand
    # info into the hands dictionary
    # basic hand info is hand number, local hand number, hand time, and table
    # everything else goes into TEXT
    for f in args.file:
        line = f.readline()
        while (len(line) != 0):
            matches = re.search("Hand #(\d*)-(\d*) - (.*)$",line)
            if (matches != None):
                handNumber = matches.group(1)
                handTime = datetime.datetime.strptime(matches.group(3),"%Y-%m-%d %H:%M:%S")
                hands[handNumber] = {LOCAL: matches.group(1),
                                   DATETIME: handTime,
                                   TEXT: ''}
                line = f.readline()
                while (not (line.strip() == '')):
                    table = re.search("Table: (.*)$",line)
                    if (table != None):
                        tableName = table.group(1)
                        if (not tableName in tables):
                            tables[tableName] = {COUNT: 0, LATEST: ""}
                        hands[handNumber][TABLE] = tableName
                    hands[handNumber][TEXT] = hands[handNumber][TEXT] + line
                    line = f.readline()
            else:
                line = f.readline()
        f.close()

    handNumber = ""
    handTime = datetime.datetime.now()

    # now that we have all hands from all the files,
    # use the timestamps of the imported hands to process them in chronological order
    # this is the place for processing the text of each hand and look for player actions
    for handNumber in sorted(hands.keys(), key=lambda hand: hands[hand][DATETIME] ):
        # print(handNumber) #DEBUG
        handTime = hands[handNumber][DATETIME]
        table = hands[handNumber][TABLE]
        tables[table][COUNT] += 1
        tables[table][LATEST] = handNumber
        tables[table][LAST] = handTime
        lastHandTime = handTime
        # print(handTime) # DEBUG

        for line in hands[handNumber][TEXT].splitlines():
            # the text match to look for a seated player and see their chip amount
            seat = re.search("Seat \d+: (\w+) \(([\d.]+)\)",line)
            if (seat != None):
                player = seat.group(1)
                stack = float(seat.group(2))

                # print("Player found " + seat.group(1) + " with chip count " + seat.group(2))
                if (not player in players):
                    players[player] = {IN: stack, OUT: 0, NOTES: ""}
                    players[player][table] = {FIRST: stack, IN: stack, LATEST: stack, OUT: 0, LEFT: False}
                    players[player][NOTES] = ("Player Notes for " + player + os.linesep + str(handTime) +
                                              " table " + table +
                                              " hand (" + handNumber + ") " +
                                              "initial buy in " + str(stack) + os.linesep)
                    csvRows.append([handTime,table,handNumber,player,"initial buy in",stack,""])

                elif (not table in players[player]):
                    players[player][IN] += stack
                    players[player][table] = {FIRST: stack, IN: stack, LATEST: stack, OUT: 0, LEFT: False}
                    players[player][NOTES] = players[player][NOTES] + (str(handTime) +
                                              " table " + table +
                                              " hand (" + handNumber + ") " +
                                              "initial buy in " + str(stack) + os.linesep)
                    csvRows.append([handTime,table,handNumber,player,"initial buy in",stack,""])

                else:
                    # check for consistent state of chips from last hand
                    # this is where we find corner cases and so on
                    # found split pot issue, side pot issue by virtue of having this consistency check
                    # NOTE - if player was waiting the stack may have changed,
                    #        so adjust accordingly and log it
                    if (players[player][table][LATEST] != stack):
                        if (players[player][table][WAITING] or players[player][table][LEFT]):
                            if (stack > players[player][table][LATEST]):
                                adjustment = stack - players[player][table][LATEST]
                                players[player][table][LATEST] = stack
                                players[player][table][IN] += adjustment
                                players[player][IN] += adjustment
                                action = "player returned with " if (players[player][table][LEFT]) else "while waiting added on by "
                                players[player][NOTES] = (players[player][NOTES] + str(handTime) + " table " + table +
                                                          " hand (" + handNumber + ") " + action + str(adjustment) + os.linesep)
                                csvRows.append([handTime,table,handNumber,player,"add on while waiting",adjustment,""])
                            else:
                                adjustment = players[player][table][LATEST] - stack
                                players[player][table][LATEST] = stack
                                players[player][table][OUT] += adjustment
                                players[player][OUT] += adjustment
                                players[player][NOTES] = (players[player][NOTES] + str(handTime) + " " + table + " hand (" + handNumber + ") " +
                                                          "while waiting reduced by " + str(adjustment) + os.linesep)
                                csvRows.append([handTime,table,handNumber,player,"reduction while waiting","",adjustment])
                        else:
                            print("Inconsistent state for " + player + " in table " + table + " hand " + handNumber + " has " + str(stack) +
                                  " expected " + str(players[player][table][LATEST]))

                # player is active at this table, so mark the LEFT attribute for the tabe as False
                players[player][table][LEFT] = False

                # change state on sitting or waiting
                if (re.search(r'sitting',line) or re.search(r'waiting',line)):
                    players[player][table][WAITING] = True
                else:
                    players[player][table][WAITING] = False

            # the text to match for an add on
            addOn = re.search("(\w+) adds ([\d.]+) chip",line)
            if (addOn != None):
                player = addOn.group(1)
                additional = float(addOn.group(2))
                players[player][IN] += additional
                players[player][table][IN] += additional
                players[player][table][LATEST] += additional
                players[player][NOTES] = (players[player][NOTES] + str(handTime) + " table " + table +  " hand (" + handNumber + ") " +
                                          "added on " + str(additional) + os.linesep)
                csvRows.append([handTime,table,handNumber,player,"add on",additional,""])


            # the text to check for a win
            winner = re.search("(\w+) (wins|splits).*Pot *\d? *\(([\d.]+)\)",line)
            if (winner != None):
                player = winner.group(1)
                win = float(winner.group(3))
                players[player][table][LATEST] += win

            # find contributions to the pot
            # this is a series of contributions of the form "PlayerName: Amount" separated by commas
            # needed for updating the LATEST amount on this table for each player, for consistency check next hand
            pot = re.search("Rake.*Pot.*Players \((.*)\)", line)
            if (pot != None):
                potString = pot.group(1)
                for contribution in potString.split(","):
                    (player,amount) = contribution.split(":")
                    player = player.strip()
                    players[player][table][LATEST] -= float(amount)

        # end of for loop, loop through active players and see if anyone has left the table -
        # if so, register a cash out and also mark the player as having LEFT the table
        for player in players.keys():
            seatSearch = r"Seat \d: " + re.escape(player)
            if (not re.search(seatSearch, hands[handNumber][TEXT])):
                if (table in players[player] and not players[player][table][LEFT]):
                    amount = players[player][table][LATEST]
                    players[player][OUT] += amount
                    players[player][table][OUT] += amount
                    players[player][table][LATEST] = 0
                    players[player][table][WAITING] = True
                    players[player][NOTES] = (players[player][NOTES] + str(handTime) + " table " + table + " hand (" + handNumber + ") " +
                                          "stood up with " + str(amount) + os.linesep)
                    csvRows.append([handTime,table,handNumber,player,"stood up with","",amount])
                    players[player][table][LEFT] = True



# SUMMARIZE
# note how many unqiue players
# note how many hands processed for each table
# then for each table, and each player, find out who was still listed as not left and mark them
# as left and what they stood up with

print("Players: " + str(len(players)))
for table in tables:
    print("Table " + table + ": Processed hands: " + str(tables[table][COUNT]))

    for player in players.keys():
        # done processing the hands, so get players up from the table
        if (table in players[player] and not players[player][table][LEFT]):
            amount = players[player][table][LATEST]
            players[player][OUT] += amount
            players[player][table][OUT] += amount
            players[player][table][LATEST] = 0
            players[player][table][LEFT] = True
            players[player][NOTES] = (players[player][NOTES] + str(tables[table][LAST]) + " table " + table +
                                      " hand (" + tables[table][LATEST] + ") " +
                              "ended table with " + str(amount) + os.linesep)
            csvRows.append([tables[table][LAST],table,tables[table][LATEST],player,"ended table with","",amount])

netBalance = 0

# separator
print("")

if (lastHandTime is not None):
    sessionDate = lastHandTime.strftime("%m/%d/%Y")

note = 'Python calculation of session'
for player in players.keys():
    # final tally
    cashIn = players[player][IN]
    cashOut = players[player][OUT]
    disposition=''
    diff = 0
    alias = player
    if (player in resolvedScreenNames):
        alias = resolvedScreenNames[player][NAME]
    players[player][NOTES] = (players[player][NOTES] + "Total IN " + str(cashIn) + os.linesep)
    players[player][NOTES] = (players[player][NOTES] + "Total OUT " + str(cashOut) + os.linesep)
    if (cashIn == cashOut):
        players[player][NOTES] = (players[player][NOTES] +  player + ' breaks even.' + os.linesep)
        disposition = "due"
    elif (cashIn > cashOut):
        diff = cashIn - cashOut
        netBalance += diff
        players[player][NOTES] = (players[player][NOTES] +  player + ' owes ' +str(diff) + os.linesep)
        disposition = "owes"
    elif (cashIn < cashOut):
        diff = cashOut - cashIn
        netBalance -= diff
        players[player][NOTES] = (players[player][NOTES] +  player + ' is due ' +str(diff) + os.linesep)
        disposition = "due"

    csvBalances.append([sessionDate,disposition,alias,diff,note])



    if(not args.quiet):
        print(players[player][NOTES])
        print("")

print("Net balance: " + str(netBalance))

if (args.doCsv):
    # Output CSV file of transactions
    with open(CSVTRANS, 'w', newline='') as csvfile:
        logwriter = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL)
        logwriter.writerow(csvHeader)
        for row in csvRows:
            logwriter.writerow(row)

        csvfile.close()
        print("CSV content written to " + CSVTRANS)

    # Output CSV file of balances
    with open(CSVBALANCE, 'w', newline='') as csvfile:
        logwriter = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL)
        logwriter.writerow(csvBalanceHeader)
        for row in csvBalances:
            logwriter.writerow(row)

        csvfile.close()
        print("CSV balance content written to " + CSVBALANCE)

if (args.doEmail):
    smtp = SMTP()

    smtp.set_debuglevel(DEBUGLEVEL)
    smtp.connect(SMTPSERVER, SMTPPORT)
    smtp.login(FROMADDRESS, emailPassword)
    #TODO: error handling for a failed login to SMTP server

    date = datetime.datetime.now().strftime("%a, %d %b %Y %T %z (%Z)")
    emailCount = 0

    for player in players:
        subj = EMAIL_SUBJ_PREFIX + sessionDate
        #if (player == "StevieG"):
        if (player in resolvedScreenNames and EMAIL in resolvedScreenNames[player]):
            emailCount += 1
            recipients = [CCADDRESS]
            to_addr = resolvedScreenNames[player][EMAIL]
            recipients.append(to_addr)

            subj = subj + " for " + player 
            message_text = players[player][NOTES]

            msg = ("From: %s\nTo: %s\nCC: %s\nSubject: %s\nDate: %s\n\n%s"
                   % (FROMADDRESS, to_addr, CCADDRESS, subj, date, message_text))

            smtp.sendmail(FROMADDRESS, recipients, msg.encode("utf-8"))
    smtp.quit()
    print("Email messages sent: " + str(emailCount))
naked_eskimo
Posts: 123
Joined: Wed Jan 07, 2015 3:51 pm

Re: Hand History Parser for obtaining Player Stats

Post by naked_eskimo »

Would you be able to elaborate a bit on how to use this? I'm interested, but can't seem to get it going.

I installed python3 on my windows 2019 server. I then created processlogs.py and copied your script into it. Then in the same folder I copied over a hand history logfile and tried the usage syntax and the below happens:

C:\misc\python>processlogs.py hh2020.txt
Traceback (most recent call last):
File "C:\misc\python\processLogs.py", line 234, in <module>
line = f.readline()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 4918: character maps to <undefined>


I'm sure I am just doing something really wrong.
StevieG
Posts: 56
Joined: Mon May 04, 2020 12:27 pm

Re: Hand History Parser for obtaining Player Stats

Post by StevieG »

Yes, I will be happy to try to help here.

The python script is expecting to get plain old ASCII text, and it appears that the file you saved may have Unicode that is unexpected.

May I ask how you saved the hand history file?

Try to copy the text and save it out using something like Notepad as a plain .txt file, and see if that changes the outcome.
zxzx10r
Posts: 51
Joined: Thu Dec 17, 2015 12:19 am

Re: Hand History Parser for obtaining Player Stats

Post by zxzx10r »

eightospade wrote:Hi Kent and Team,

Is there an existing script or utility that parses hand histories?

I wanted to know if there exists a parsing tool for the hand history text in order to obtain ANY kind of data. I want to be able to obtain stats about myself and opponents, I'm talking HUD stats like VPIP, 3b, RFI, etc.

Any kind of script or program that interacts with the hand history files is of interest to me and could be a handy starting point.

Thanks,
8s
I could write one for you if you wish. let me know if you are interested and we can work out a price
naked_eskimo
Posts: 123
Joined: Wed Jan 07, 2015 3:51 pm

Re: Hand History Parser for obtaining Player Stats

Post by naked_eskimo »

StevieG wrote:Yes, I will be happy to try to help here.

The python script is expecting to get plain old ASCII text, and it appears that the file you saved may have Unicode that is unexpected.

May I ask how you saved the hand history file?

Try to copy the text and save it out using something like Notepad as a plain .txt file, and see if that changes the outcome.
i just copied one of the hand history txt files from the data folder into the python folder.
StevieG
Posts: 56
Joined: Mon May 04, 2020 12:27 pm

Re: Hand History Parser for obtaining Player Stats

Post by StevieG »

naked_eskimo wrote:
StevieG wrote: i just copied one of the hand history txt files from the data folder into the python folder.
There definitely appears to be an encoding issue.

Let's try this - open the file with WordPad, then use "Save As..." and select "Unicode text File" from the list of dropdowns for the file format.

After that, try running the new text file through the script.
naked_eskimo
Posts: 123
Joined: Wed Jan 07, 2015 3:51 pm

Re: Hand History Parser for obtaining Player Stats

Post by naked_eskimo »

That seemed to get further:

C:\misc\python>processLogs.py EventLog2020-04-13.txt
Players: 0

Net balance: 0


Not sure what the output should look like.
StevieG
Posts: 56
Joined: Mon May 04, 2020 12:27 pm

Re: Hand History Parser for obtaining Player Stats

Post by StevieG »

A ha!

OK, cool. Kinda.

Here is what I learned.

#1, saving from WordPad to Unicode text actually saves the file as UTF-16. Which we don't want. But at least all the bytes were read. So we do not want to do that.

#2, somehow the Python script thinks the files are CP-1252 (which they are NOT) so we need to correct that.

In the script find the lines that read

Code: Select all

    for f in args.file:
        line = f.readline()
I think this is line 215 but maybe not

you want to replace these two lines as follows (the spacing is important in Python) :

Code: Select all

    for fh in args.file:
        f = open(fh.name, mode='r', encoding='utf-8')
        line = f.readline()
then run the script against your original file (do NOT save it as Unicode text from Wordpad)

See if that makes a difference.
Post Reply