Obtaining total harvesting balances for Symbol nodes

I thought I would write a quick blog post today about how to retrieve the balances of all harvesting accounts across all Symbol API nodes. I wrote a very basic Perl script to do this shortly after Symbol launched but it was painfully slow so last week I decided to revisit the problem now that I know a lot more about Symbol than I did in 2021. I’m just going to go through the basics of the Python code which is presented in a Jupyter Notebook available on GitHub.

In order to run the code you will need to be able to log into an API node and access mongoDB. Because we are accessing data directly from the database it speeds things up significantly.

As usual, first we have to import some packages (you will need to install these using pip if you haven’t already).

from pymongo import MongoClient
import base64
from sshtunnel import SSHTunnelForwarder
from binascii import hexlify, unhexlify
from symbolchain.facade.SymbolFacade import SymbolFacade
from symbolchain.CryptoTypes import PublicKey
import requests
from tqdm import tqdm
from collections import defaultdict

We will use sshtunnel to connect to our node using the node username and password and pymongo for querying the database. First we will specify the login details for the node and set up the connection to mongoDB running in docker.

MONGO_HOST = "***" # Add your node address here

server = SSHTunnelForwarder(
    (MONGO_HOST,22),
    ssh_username='***', # Add your node's login username
    ssh_password='***', # Add your node's login username
    remote_bind_address=('172.20.0.2', 27017)
  )
server.start()

client = MongoClient('localhost', server.local_bind_port)
print(server.local_bind_port)
print("connected")

Next we need to read in a list of nodes to query. These should just be a list of node addresses e.g. symbolblog-testnet.com, xymharvesting.net etc separated by new lines. In this case I am reading a file containing all nodes I gathered from symbolnodes.org which I have saved as “list.txt”.

# Read in list of node addresses

nodes = []  # Initialize an empty list to store nodes
try:
    with open("list.txt", 'r') as file:
        for line in file:
            nodes.append(line.strip())
except FileNotFoundError:
    print(f"File not found: {list.txt}")

OK, so we have a list of nodes, now we want to iterate through this list and find the accounts that are actively harvesting on that node. We can’t do this directly from mongoDB as the node-specific information is not stored in the database. Therefore, we have to query each node individually by looking at the node/unlockedaccount endpoint with a REST query. I have added a 10-second timeout so that the process doesn’t just hang if a node is unreachable and created a dictionary to store the node as the key and a list of account public keys as the value.

# Function to fetch JSON data for unlocked accounts (harvesters on the node)
def fetch_json_data(node, timeout=10):  # timeout of 10 seconds
    url = f"http://{node}:3000/node/unlockedaccount"
    try:
        response = requests.get(url, timeout=timeout)
        response.raise_for_status()
        data = response.json()
        return data
    except requests.Timeout:
        print(f"Request to {url} timed out.")
        return None
    except requests.RequestException as e:
        print(f"Request to {url} failed: {e}")
        return None

# Create a dictionary to store nodes and harvesters
node_details = {}

# Loop through each node, fetch JSON data, and add it to the dictionary
for node in tqdm(nodes, desc="Fetching JSON Data"):  # tqdm wrapper for progress bar
    json_data = fetch_json_data(node)
    if json_data and 'unlockedAccount' in json_data:
        node_details[node] = json_data['unlockedAccount']

Now we have a data structure node_details containing a mapping of harvesting account public keys to node names. The next issue we have is that we need to convert the public keys to addresses. This can be done using the Symbol Python SDK. We also need to find the “main” accounts associated with each of the harvesters as the results from the previous step return the linked accounts which will have a zero balance.

In the code below I look for transactions of type 16716. These are account link transactions, I am looking for those cases where the linkAction is 1 meaning that the account is being linked rather than removing the link. Data returned from mongoDB is in binary format and again will represent the public key rather than the address. Here I decode the binary public key data and use the SDK to give me the address of both the main (balance containing) and linked addresses. I store the mapping of linked addresses to main addresses in the dictionary mapping.

Finally, I create a new dictionary that will store the main balances of the accounts harvesting on each node by looking up the mapping of linked to main accounts in mapping.

facade = SymbolFacade('mainnet')

db = client['catapult']
collection = db.transactions
mapping = defaultdict(dict)

out = collection.find({'transaction.type': 16716})
for x in out:
    if x['transaction']['linkAction'] == 1:
        main = str(facade.network.public_key_to_address(PublicKey((hexlify(x['transaction']['signerPublicKey']).decode('utf8')))))    
        link = str(facade.network.public_key_to_address(PublicKey((hexlify(x['transaction']['linkedPublicKey']).decode('utf8')))))
        mapping[link] = main

address_details = defaultdict(list)
for node, public_keys in node_details.items():
    for pk in public_keys:
        address = str(facade.network.public_key_to_address(PublicKey(pk)))
        if address in mapping:  # Check if the address is in mapping
            address_details[node].append(mapping[address])
        else:
            address_details[node].append(address)

So now we have a dictionary address_details that contains the main addresses of all harvesters on each node. Our next issue is that we need to obtain the balances of these accounts. Now, rather than query mongoDB for the balances of each of the accounts in address_details I just retrieve the balance of every address known to the network. This decodes the binary mongoDB address and fetches the amount of symbol.xym (mosaic ID 7777031834025731064) that they own. I create a dictionary balances to store a mapping of addresses to XYM balance.

db = client['catapult']
collection = db.accounts
facade = SymbolFacade('mainnet')

out = collection.find()

balances = {}

for x in out:
    xym = 0
    address = str(base64.b32encode(x['account']['address']).decode('utf8')[0:39])
    mosaics = x['account']['mosaics']
    for mos in mosaics:
        if (mos['id'] == 7777031834025731064):
            balances[address] = mos['amount']/1000000
            xym = 1
    if xym == 0:
        balances[address] = 0

So far so good. We have address_details mapping main accounts to a node and balances containing the balance of every account on the network. Now it is a simple case of iterating through our nodes and each of their harvesting addresses and adding the balances to a new dictionary balance_details.

balance_details = defaultdict(list)

for node, addresses in address_details.items():
    for address in addresses:
        try:
            balance_details[node].append(balances[address])
        except:
            balance_details[node].append(0)

The keys to our dictionary balance_details are the node addresses and the value is a list of balances for all main accounts harvesting on that node. Next, I want to output some stats such as the number of harvesters on each node, the node main account balance, and the sum of all XYM harvesting on that node, and output this to a tsv file.

with open('nodes_balances.tsv', 'w') as file:
    file.write("Node\tHarvesters\tMain\tTotal\n")
    
    # Iterate through the dictionary and write the required information
    for node, balances in balance_details.items():
        list_size = len(balances)
        main = balances[0] if balances else 'N/A'  # Handle empty list case
        total = sum(balances)
        file.write(f"{node}\t{list_size}\t{main}\t{total}\n")

The key in the dictionary is always the node address so we can print this to our tab-separated output. We then can find the number of harvesters on the node by looking at the list size of the value (a list) assigned to that key. The main harvesting account is always the first entry in the list and therefore we can access the balance of the node account at position zero. Finally, we can sum all balances in the list to obtain the total harvesting balance of the node.

Wrap-up

So this is a simple enough script but by using mongoDB and the Python SDK I achieved a huge speedup versus my original Perl code. We are talking about a 60-fold speedup and runs in a few minutes so I am glad that I did spend a little time rewriting the code more efficiently. He is the node list and results and the full Jupyter Notebook can be found here.

Avatar photo
NineLives
admin@symbolblog.com

I'm a Symbol and NEM enthusiast and run this blog to try to grow awareness of the platform in the English-speaking world. If you have any Symbol news you would like me to report on or you have an article that you would like to publish then please let me know!

No Comments

Post A Comment

17 + thirteen =