My journey with MongoDB 2: The Big Bang query

I wasn’t going to write another NEM / Symbol Advent Calendar article this year but I saw a single empty spot in calendar number two and thought I would come back for a second round. I am a glutton for punishment, or maybe I should say I like to challenge myself and learn new things 😁

Introduction

In this article, I will (hopefully) write about a little project I started a few days ago. I thought it would be straightforward but like with everything blockchain, it was more challenging than I thought. I am actually being very optimistic here and have started writing this before I have code that functions correctly 😬 The idea was to create a historical record of the Symbol blockchain from the genesis block so that you could query an address at any given block height and it would return the account balance. I thought that would be simple, just replay all transactions since the first block, store them in some kind of data structure and then retrieve the balance at a particular block height. Actually, after writing some code to do it I realised it wasn’t quite as easy as I first thought!

As in my previous article, I will be relying on MongoDB and Python and will try to explain my chaotic logic as I go through the code (the entire Jupyter Notebook is linked at the end of the post). At the time of writing this introduction, I still haven’t implemented my nemeses NEMeses that are linked harvesting accounts, namespace rental, hash locks, secret locks and secret proofs but fingers crossed I will get there before this article is due to go out! 😬

Note of caution: It will take around 30 minutes to run all of the code in the Jupyter Notebook and recreate historical balances so you’ll need to be patient! Also, I have a fast internet connection with a decent connection speed to my node so I guess it might take longer if your connection is slower.

Note of frustration: Half an hour may not seem too long but when you are trying to debug something that happens in 1 in ~10,000 accounts it feels like an eternity! 😂

Step 1 – the setup

As usual, I am going to import some Python modules before I get started.

from pymongo import MongoClient
import base64
from sshtunnel import SSHTunnelForwarder
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from binascii import hexlify, unhexlify
import base64
from symbolchain.facade.SymbolFacade import SymbolFacade
from symbolchain.CryptoTypes import PublicKey
from collections import defaultdict

Then we need to set up the MongoDB connection to the node.

MONGO_HOST = "symbolblog-testnet.com"

server = SSHTunnelForwarder(
    (MONGO_HOST,22),
    ssh_username='**********',
    ssh_password='**********',
    remote_bind_address=('172.20.0.2', 27017)
  )
server.start()

client = MongoClient('localhost', server.local_bind_port)
print(server.local_bind_port)
print("connected")

Step 2 – those pesky linked accounts

Most Symbol accounts have a linked address that signs when they harvest a block. This means that when iterating through all blocks to obtain the signing account (harvester) we need to link the address back to the main account to add the rewards. I started off just getting the current linked account from the accounts collection in Mongo but then realised that of course users can relink their accounts to a new address at any point in time. This means we need to have all historical account links in order to assign a block reward to the correct main account.

In order to do this I queried the transactions collection in the catapult database for transaction type 16716 which represents all account link transactions. I initially just stored a simple mapping of the linked address to the main account as a dictionary but… as it turns out some linked accounts were associated with several main accounts over the course of history which is almost certainly bad practice and definitely a pain in my ass – this took forever to find but it did improve my results in Step 6! 😁

db = client['catapult']
collection = db.transactions
facade = SymbolFacade('mainnet')
mapping = defaultdict(dict)

out = collection.find({'transaction.type': 16716})

for x in out:
    if x['transaction']['linkAction'] == 1:
        height = x['meta']['height']
        main = str(facade.network.public_key_to_address(PublicKey((hexlify(x['transaction']['signerPublicKey']).decode('utf8')))))    
        link = str(facade.network.public_key_to_address(PublicKey((hexlify(x['transaction']['linkedPublicKey']).decode('utf8')))))
        # Ergh - some link accounts are used by more than one main account at different blocks :(
        # Need to include height :(
        # Shame on these accounts! :D
        mapping[link][height]=main

Step 3: Inflation

As you will know, in Symbol the block rewards reduce over time so in order to see how much a harvester earned for making a block we need to refer to the inflation config properties of Symbol’s mainnet. I have just downloaded this file from GitHub and reformatted it slightly so it is a simple two-column tab-separated file showing the block number and reward amount in XYM. Also, make sure that the first column starts at 1 not 2 (or tell the code later on to ignore block 1).

df_reward = pd.read_csv("block_rewards.tsv", sep='\t')

Step 4: Rewards and fees

Right, so now we can start peering into the past and iterating through all blocks since Symbol’s genesis and calculating fees and harvesting rewards for all accounts. As usual, I am going to store the data in a pandas dataframe for later use. Every record in the blocks collection contains the total fees paid, the harvester and the beneficiary address. As I pointed out in Step 2, the harvester address is usually a linked account, not the user’s main address. Therefore we need to look up the signerPublicKey (harvesting account address) in our dictionary linked to the main accounts we made earlier. Late addition: it seems that some accounts are using their linked address in their node beneficiary config too so added extra logic to see whether the beneficiary account is linked 😫 Later addition: some linked accounts are linked to more than one main account over time, added extra logic to loop through the account links and assign the correct main account for that block height 🤬 You guys are testing me in so many weird and wonderful ways. 😂

Next, we look up the reward at the current block height in our loop from our df_reward dataframe and add this value to our total fees. Just a reminder that values are stored in microXYM and therefore have to be divided by 1,000,000 to give us the value in XYM.

OK, so now we have the total reward for a given block and next we have to work out the amount that is distributed to harvester (70%), node beneficiary (25%) and the network sink address (5%). Once we have these values we add the block height and reward for each to a dictionary where the key is the account address (one for each of the beneficiaries). Note: there are two harvest sink addresses to deal with, NBUTOBVT5JQDCV6UEPCPFHWWOAOPOCLA5AY5FLI and the newer NAVORTEX3IPBAUWQBBI3I3BDIOS4AVHPZLCFC7Y I believe that these switched over during the Cyprus fork at block 689,761.

facade = SymbolFacade('mainnet')

db = client['catapult']
collection = db.blocks
blocks = collection.find()

historic_fees = defaultdict(dict)

harvest_sinkV1 = "NBUTOBVT5JQDCV6UEPCPFHWWOAOPOCLA5AY5FLI"
sink = "NAVORTEX3IPBAUWQBBI3I3BDIOS4AVHPZLCFC7Y"
fork_height = 689761

f = open('block_data.tsv', 'w', encoding="utf-8")
f.write("Block\tTimestamp\tTotalFee\tFeeMultiplier\tHarvester\tBeneficiary\tReward\n")

i = 0
j = 0

for x in blocks:
    height = int(x['block']['height'])
    multiplier = x['block']['feeMultiplier']
    fee = x['meta']['totalFee']/1000000
    timestamp = x['block']['timestamp']
    h = str(facade.network.public_key_to_address(PublicKey((hexlify(x['block']['signerPublicKey']).decode('utf8')))))

    # This is a hassle.. There are linked accounts associated with multiple main accounts 🙄
    if not mapping[h]: 
        harvester = h
    else: 
        for keys in mapping[h]:
            if keys <= height:
                harv = mapping[h][keys]
        harvester = harv
                
    beneficiary = str(base64.b32encode((bytes(x['block']['beneficiaryAddress']))).decode('utf8')[0:39])
    
    # Eek - some node fee beneficiaries are sending funds to their linked address :/
    # Probably should use the same logic as above to avoid nasty surprises
    
    if not mapping[beneficiary]:
        beneficiary = beneficiary
    else:
        for keys in mapping[beneficiary]:
            if keys <= height:
                b = mapping[beneficiary][keys]
        beneficiary = b

    reward = (df_reward.iloc[df_reward[df_reward['Height'].le(height)].index[-1]]['Reward'])/1000000
    f.write("{0}\t{1}\t{2}\t{3}\t{4}\t{5}\t{6}\n".format(height, timestamp, fee, multiplier, harvester, beneficiary, reward))
    
    beneficiary_reward = (reward+fee) * 0.25
    harvester_reward = (reward+fee) * 0.70
    sink_reward = (reward+fee) * 0.05

    # Harvesting and benefificary income
    historic_fees[harvester][height]=harvester_reward
    
    if height >= fork_height:
        try:
            historic_fees[sink][height] += sink_reward
        except:
            historic_fees[sink][height] = sink_reward       
    else:
        try:
            historic_fees[harvest_sinkV1][height] += sink_reward
        except:
            historic_fees[harvest_sinkV1][height] = sink_reward
    
    try:
        historic_fees[beneficiary][height] += beneficiary_reward
    except:
        historic_fees[beneficiary][height] = beneficiary_reward

    if (j == 100000):
        print("{0} blocks processed".format(i))
        j = 0
    i+=1
    j+=1
    
f.close()

Again, we can read this in to a pandas dataframe:

df_block_data = pd.read_csv("block_data.tsv", sep='\t')

And for later use with namespace rental fees make a mapping of block height to fee multiplier

multiplier_mapping = dict(zip(df_block_data.Block, df_block_data.FeeMultiplier))

Before moving on to step 5. I just keep a count of the final block processed so we don’t exceed this in the next query:

end = int(df_block_data['Block'].iloc[-1])

And lastly, get the balance of all accounts from the accounts collection which I will use to check if everything has worked correctly in Step 6.

db = client['catapult']
collection = db.accounts
facade = SymbolFacade('mainnet')

f = open('xym_addresses.tsv', 'w', encoding="utf-8")
f.write("Address\tBalance\n")

out = collection.find()

for x in out:
    xym = 0
    address = str(base64.b32encode(x['account']['address']).decode('utf8')[0:39])
    mosaics = x['account']['mosaics']
    for mos in mosaics:
        if (mos['id'] == 7777031834025731064):
            f.write('{0}\t{1}\n'.format(address, mos['amount']/1000000))
            xym = 1
    if xym == 0:
        f.write('{0}\t{1}\n'.format(address, 0))

f.close()
df_balance = pd.read_csv("xym_addresses.tsv", sep='\t')

Transaction fees

As you may know, Symbol transaction fees are calculated based on the transaction size and the block feeMultiplier which we obtained above. Just to enable fast lookup I have made a new dictionary where block height is the key and the feeMultiplier is the value. Using this information we will be able to calculate the fee paid to send each of the transactions made on the network in Step 5.

multiplier_mapping = dict(zip(df_block_data.Block, df_block_data.FeeMultiplier))

Namespace and mosaic rental fees

Another complication which we will meet in Step 5 is that namespace and mosaic rental fees are calculated dynamically. In order to calculate the correct value we need to find the median feeMultiplier over the previous 60 blocks and multiply this value by the duration of the namespace registration period in blocks, or in the case of mosaics multiply the value by 0.5. Here I am just making a list of feeMultiplier values across all blocks for later use.

block_fee_mult = df_block_data['FeeMultiplier'].tolist()

Treasury reissuance

After the Cyprus hard fork, all of the old Symbol central funds were consolidated into one address NCHEST3QRQS4JZGOO64TH7NFJ2A63YA7TPM5PXI and new sink addresses were set up to collect harvest and rental fees. Funds were reallocated by revoking and reissuing the mosaics at block 689,761.

Address aliases 😫

Another Symbol feature is that you can set up an account alias which allows others to send transactions to your account just using the linked namespace. For example, I have the namespace symbolblog linked to the account NCAY26LEBPOXM7NPCNV4HL4EH5WM6UJ5UUN4UGA. This however causes further headaches in Step 5 as we need to resolve the alias into an address. Here is an example of a transaction where 1 XYM is sent to the account alias symbolblog and as you can see the recipientAddress is padded with trailing zeros. Converting this to a Symbol address gives the address NHXVJOI33ETOJRIAAAAAAAAAAAAAAAAAAAAAAAA and not NCAY26LEBPOXM7NPCNV4HL4EH5WM6UJ5UUN4UGA.

{
   "meta":{
      "height":"885312",
      "hash":"720F49BFA6061F003A3C80F76DE4C0FD18A99F42815E8C5CD8E0FBA122616F23",
      "merkleComponentHash":"720F49BFA6061F003A3C80F76DE4C0FD18A99F42815E8C5CD8E0FBA122616F23",
      "index":0,
      "timestamp":"26666761725",
      "feeMultiplier":10
   },
   "transaction":{
      "size":207,
      "signature":"8A7242E269615DE40951A4531C3251D774B9782626B9B3011BE04B8CCD217FCE32859A209D585BFC60584A175A040D6D13B2380B7D571C8D9046D683C42BAC0B",
      "signerPublicKey":"9A4FEFD73331ABC604F4C71360D91B0A112ED13AE2E7EE152A7A22B9BF01FC02",
      "version":1,
      "network":104,
      "type":16724,
      "maxFee":"2070",
      "deadline":"26673936709",
      "recipientAddress":"69EF54B91BD926E4C5000000000000000000000000000000",
      "message":"0054657374696E67206E616D657370616365206163636F756E74206C696E6B",
      "mosaics":[
         {
            "id":"6BED913FA20223F8",
            "amount":"1000000"
         }
      ]
   },
   "id":"61E6DD8891BDF55E6E6E5E11"
}

So, another hurdle and another section of code needed to capture all of the aliases from the addressResolutionStatements collection:

# Get account aliases!

db = client['catapult']
collection = db.addressResolutionStatements
facade = SymbolFacade('mainnet')
aliases = defaultdict(dict)

out = collection.find()

for x in out:
    height = x['statement']['height']
    unresolved = base64.b32encode((bytes(x['statement']['unresolved']))).decode('utf8')[0:39]
    alias_address = base64.b32encode((bytes(x['statement']['unresolved']))).decode('utf8')[0:39]
    e = 0
    for entry in x['statement']['resolutionEntries']:
        resolved_address = base64.b32encode((bytes(x['statement']['resolutionEntries'][e]['resolved']))).decode('utf8')[0:39]
        aliases[alias_address][height]=resolved_address
        e+=1

We will use the lookup table aliases in step 5 to resolve any alias addresses transacting at a given block height.

Step 5: the nightmare begins 😭

Here we are going to get down to business and process all transactions since the genesis block. I will warn you now that my code is messy and at this moment in time it doesn’t work 😂 By the time this is published (if it is published) then it will all be fixed so don’t worry! Back to modifying my code… 🙏

OK, so Symbol has 25(!) different transaction types and each has its own particular format and set of values within MongoDB. This makes grabbing the correct data from each a bit complicated, especially when not all of the relevant data is stored within the transactions collection. On the plus side, for most transaction types, we will only want to find the transaction size in order to calculate the fee paid by the account initiating the transaction since no mosaic transfers occur. Because of this, I can group the logic for 17 different transactions together and then implement a specific set of checks and calculations for the other 8 (hash lock, aggregate bonded, secret lock, secret proof, transfer, mosaic definition, namespace registration and the special case – mosaic revocation). At the Cyprus fork block, previous NxL funds were revoked and reissued and therefore we have to process mosaic revocation transactions to account for this special case. There are also pre-fork sink addresses for mosaic and namespace registration fees, these were superseded by the newer NCVORTEX4XD5IQASZQEHDWUXT33XBOTBMKFDCLI address in block 689,761. I have learned a lot more about Symbol’s history during the course of writing this code and, just when I thought I had everything covered, another special case appeared 😁

I am going to split my code up into chunks explaining how the logic for each works. The full code to recreate everything in this article is available at the end of the post as a Jupyter Notebook.

Getting started

Before we go any further I need to specify the sink addresses for mosaic and namespace registration fees and the height of the Cyprus fork when the new combined sink address became active. I also give the balance of the genesis account NASYMBOLLK6FSL7GSEMQEAWN7VW55ZSZU25TBOA before the distribution of funds in block 1.

We are connecting to the transactions collection in MongoDB and will iterate through all 8+ million transactions. I am storing all transactions in a dictionary historic this will store balance change events across all accounts and the key to the dictionary will be account address.

import statistics

facade = SymbolFacade('mainnet')

db = client['catapult']
collection = db.transactions
out = collection.find()

historic = defaultdict(dict)

i = 0
j = 0
height = 0
slocks = 0
hash_locks = defaultdict(dict)
secret_locks = defaultdict(dict)

sink = "NCVORTEX4XD5IQASZQEHDWUXT33XBOTBMKFDCLI"
mosaic_sinkV1 = "NC733XE7DF46Q7QYLIIZBBSCJN2BEEP5FQ6PAYA"
namespace_sinkV1 = "NBDTBUD6R32ZYJWDEWLJM4YMOX3OOILHGDUMTSA"
fork_height = 689761

historic["NASYMBOLLK6FSL7GSEMQEAWN7VW55ZSZU25TBOA"][1] = 7842928625

Next, I need to set up a function that will be required to calculate rental fees later in the code. It takes the block height as input and will calculate the median feeMultiplier over the previous 60 blocks. These data were stored previously through querying the blocks collection (Step 4). Note: zero values are replaced with a fee multiplier of 100.

# Get average FeeMultiplier from last 60 blocks

def feeMultiplier(block_height):
    block_height-=1
    block_height_start = block_height-60
    fee_multiplier = block_fee_mult[block_height_start:block_height]
    fee_multiplier = [100 if item == 0 else item for item in fee_multiplier]
    average = statistics.median(fee_multiplier)
    return(average)

Now it is time to loop through all transactions. As you will see I have a list called same_logic (I need more descriptive variable names). These each correspond to a transaction type in the Mongo transactions table and they all share the same property, they do not involve any funds transfer other than the transaction fee so essentially we don’t care about these.

We want to store the block height as well as information on the initiator of the transaction sender which we derive from the public key of the signer, along with the transaction type and the fee multiplier for the block height. You will see that I specify that we break out of the loop if we reach the last entry in the blocks table that we retrieved in Step 4.

If there is a size field for the transaction (not present in inner aggregates) then I calculate the fee paid and debit this amount from the sender. If the transaction is in same_logic then I am done with this transaction. If not I process it any further.

same_logic = [16716, 16972, 16705, 16707, 16963, 16708, 16964, 17220, 16973, 16725, 16974, 17230, 16720, 16976, 17232, 16977, 16721]

for x in out:
    fee = 0
    height = int(x['meta']['height'])
    
    if (height >= end):
        break
    
    sender = str(facade.network.public_key_to_address(PublicKey((hexlify(x['transaction']['signerPublicKey']).decode('utf8')))))    
    tx_type = int(x['transaction']['type'])
    multiplier = multiplier_mapping[height]    
    
    try:
        size = x['transaction']['size']
        fee = (size * multiplier)/1000000
        try:
            historic[sender][height] += -abs(fee)
        except:
            historic[sender][height] = -abs(fee)
    except:
        pass
    
    if tx_type in same_logic:
        pass

Transfer transactions

These are perhaps the simplest of transactions. There is a sender and a recipient along with a mosaic ID and amount. Here I just find any transactions that involve the XYM mosaic, remove funds from the sender and add them to the recipient.

As usual, there is a caveat here, if a transaction is sent to an account alias then the address needs to be resolved back into the recipient’s address. I have achieved this is a bit of a hacky way just by checking for the presence of a string of “A”s in the address. Any accounts matching this pattern will then be checked in the aliases dictionary we defined in Step 4 to confirm whether the suspected alias is present at that block height and then retrieve the resolved address.

elif tx_type == 16724: #TRANSFER
    recipient = str(base64.b32encode(x['transaction']['recipientAddress']).decode('utf8')[0:39])    

    if "AAAAAAAAAAAAAAAAA" in recipient:
    # Address is probably an alias
    # Get resolution

        try:
            recipient = aliases[recipient][height]
        except:
            recipient = recipient

    mosaics = x['transaction']['mosaics']        
    for mos in mosaics:
        if ((mos['id'] == 7777031834025731064) | (mos['id'] == -1780160202445377554)):
            amount = mos['amount']/1000000
            # Add to recipient
            try: 
                historic[recipient][height] += amount
            except:
                historic[recipient][height] = amount

            # Remove from sender
            try:
                historic[sender][height] += -abs(amount)
            except:
                historic[sender][height] = -abs(amount)

Hashlocks

Hashlock transactions are required when sending an aggregate bonded transaction and act as a deposit which is refunded once the aggregate bonded transaction is signed by all parties and confirmed on the network. Sending a hash lock transaction “locks” 10 XYM in the initiator’s account which is then refunded if the aggregate bonded transaction completes before the transaction deadline. We can store the hash, expiry block and sender information:

    elif tx_type == 16712: #HASH_LOCK        
        try:
            duration = x['transaction']['duration']
            lock_end = height + duration
            lock_hash = hexlify(x['transaction']['hash']).decode('utf8')
            hash_locks[lock_hash]['lock_end'] = lock_end
            hash_locks[lock_hash]['sender'] = sender
            # Pay 10 XYM deposit
            try:
                historic[sender][height] += -abs(10) 
            except:
                historic[sender][height] = -abs(10)
        except:
            pass   

We also need to check all aggregate bonded transactions for the lock hashes stored in lock_hash. If it is found, we refund 10 XYM to the account that initiated the hash lock transaction. We then delete the hash lock from the hash_locks dictionary as we know that it has been used.

    elif tx_type == 16961: # Aggregate bonded transaction
        hash_id = hexlify(x['meta']['hash']).decode('utf8')
        if hash_id in hash_locks:
            initiator = hash_locks[hash_id]['sender']
            del hash_locks[hash_id]
            try: 
                historic[initiator][height] += 10
            except:
                historic[initiator][height] = 10

After processing all transactions, those entries left in the hash_locks dictionary will either be current or will have expired. We can loop through this dictionary and if the hash lock has expired then we can credit the funds to the account that harvested the block in which the hash lock expired.

for expired in hash_locks:
    block = hash_locks[expired]['lock_end']
   
    if block <= end:
    # The harvester of the block where the hash lock tx failed gets the 10 XYM
        harv_address = df_block_data.loc[df_block_data['Block'] == block, 'Harvester'].iloc[0]

        if not mapping[harv_address]:
            harvester = harv_address
        else:
            harvester = mapping[harv_address]
        
        try: 
            historic_fees[harvester][block] += 10
        except:
            historic_fees[harvester][block] = 10    
   

Secret locks and secret proofs

Symbol secret lock transactions allow users to send funds to another account but for the funds to be locked to both the sender and the recipient until a correct proof is announced using a secret proof transaction. We can model this by removing funds from the sender on receipt of the secret lock and crediting the funds to the recipient when the proof transaction is received.

Lock

Add secret lock information to the secret_locks dictionary and remove the amount locked from the sender’s account.

    elif tx_type == 16722: #SECRET_LOCK 
        
        if x['transaction']['mosaicId'] == 7777031834025731064:
            amount = x['transaction']['amount']/1000000
            recipient = str(base64.b32encode(x['transaction']['recipientAddress']).decode('utf8')[0:39])    
            secret = hexlify(x['transaction']['secret']).decode('utf8')
            secret_duration = x['transaction']['duration']
            expiry = secret_duration + height
            secret_locks[slocks]['recipient'] = recipient
            secret_locks[slocks]['sender'] = sender
            secret_locks[slocks]['amount'] = amount
            secret_locks[slocks]['secret'] = secret
            secret_locks[slocks]['expiry'] = expiry
            secret_locks[slocks]['lock_height'] = height
            slocks+=1
            try:
                historic[sender][height] += -abs(amount)
            except:
                historic[sender][height] = -abs(amount)  

Proof

When the proof is received we credit the recipient’s account and delete the transaction from our secret_locks dictionary.

    elif tx_type == 16978: #SECRET_PROOF
        
        recipient = str(base64.b32encode(x['transaction']['recipientAddress']).decode('utf8')[0:39])  
        secret = hexlify(x['transaction']['secret']).decode('utf8')
        lock_id = ""
        for locks in secret_locks:
            lsecret = secret_locks[locks]['secret']
            lrecipient = secret_locks[locks]['recipient']
            lsender = secret_locks[locks]['sender']
            amount = secret_locks[locks]['amount']

            if ((lsecret == secret) & (lrecipient == recipient)):
                lock_id = locks
                try:
                    historic[lrecipient][height] += amount 
                except:
                    historic[lrecipient][height] = amount
        try:
            del secret_locks[lock_id]
        except:
            pass

Crediting expired secret lock funds

Each secret lock transaction has a duration of blocks after which it will expire. If the lock expires in block height + duration then we need to credit those funds back to the sender of the secret lock at the expiry block. Therefore, after all of the transactions have been processed any entries remaining in our dictionary secret_locks will have either expired or are still valid but have not yet been unlocked. Iterating through this dictionary allows us to check if the lock is still current and if not, to credit the sender at the time of expiry.

for l in secret_locks:
    sender = secret_locks[l]['sender']
    amount = secret_locks[l]['amount']
    expiry = secret_locks[l]['expiry']
    lock_height = secret_locks[l]['lock_height']
    
    if expiry < end:
        try: 
            historic[sender][lock_height] += amount
        except:
            historic[sender][lock_height] = amount         

Mosaic definition

All mosaics created on the NEM blockchain were ported over to Symbol in the genesis block and no fee was associated with their creation (hence I am excluding mosaic definition transactions at block height 1). From then on a registration fee was required when creating a new mosaic. This fee is equivalent to 0.5 XYM multiplied by the median fee multiplier in the previous 60 blocks. As explained above, all mosaic creation fees generated are credited to the mosaic sink address, of which there are two, one pre- and one post-fork. Here we just need to send the fees to the correct sink address and deduct the rental fee from the sender.

    elif tx_type == 16717: #MOSAIC_DEFINITION
        if height > 1: 
            average_fee_mulitplier = feeMultiplier(height)
            rental_fee = average_fee_mulitplier * 0.5
            try:
                if height >= fork_height:
                    historic[sink][height] += rental_fee
                else:
                    historic[mosaic_sinkV1][height] += rental_fee
            except:
                if height >= fork_height:
                    historic[sink][height] = rental_fee
                else:
                    historic[mosaic_sinkV1][height] = rental_fee
                    
            try:
                historic[sender][height] += -abs(rental_fee)
            except:
                historic[sender][height] = -abs(rental_fee)

Namespace registration

Much the same as the mosaic registration above, fees are associated with creating a new namespace. Again those that are moved over from NEM at block height 1 are free but for all later transactions, we need to calculate the appropriate fee. Again there are two sink addresses, one pre- and one post-fork and we have to calculate the dynamic fee multiplier.

In this case, the cost of registering a new namespace is dependent on the length of registration. If you have ever registered a domain name you will know that you pay by the length of the registration period (i.e. two years costs double the price of one year) and that at the end of the registration period the domain will expire if it isn’t renewed. The exact same principle applies in Symbol and rather than being charged by the year, you pay per block. The formula for calculating the fee is straightforward:

(average_fee_mulitplier * 2 * duration)/1000000

And you can register child namespaces at a reduced rate of:

0.1 XYM * average_fee_mulitplier

elif tx_type == 16718: #NAMESPACE_REGISTRATION
                
        if height > 1:        
            try:
                duration = x['transaction']['duration'] 
                average_fee_mulitplier = feeMultiplier(height)
                rental_fee = (average_fee_mulitplier * 2 * duration)/1000000
                try:
                    historic[sender][height] += -abs(rental_fee)
                except:
                    historic[sender][height] = -abs(rental_fee)
                try:
                    if height >= fork_height:
                        historic[sink][height] += rental_fee
                    else:
                        historic[namespace_sinkV1][height] += rental_fee
                except:
                    if height >= fork_height:
                        historic[sink][height] = rental_fee
                    else:
                        historic[namespace_sinkV1][height] = rental_fee       
                
            except:

                average_fee_mulitplier = feeMultiplier(height)
                rental_fee = 0.1 * average_fee_mulitplier
                try:
                    historic[sender][height] += -abs(rental_fee)
                except:
                    historic[sender][height] = -abs(rental_fee)
                try:
                    if height >= fork_height:
                        historic[sink][height] += rental_fee
                    else:
                        historic[namespace_sinkV1][height] += rental_fee
                except:
                    if height >= fork_height:
                        historic[sink][height] = rental_fee
                    else:
                        historic[namespace_sinkV1][height] = rental_fee

Once the fee has been calculated we subtract the relevant amount from the sender and credit the funds to the relevant sink address.

Mosaic revocation

This is an odd one. Usually, XYM (mosaic 6BED913FA20223F8) are not revocable, this makes sense as your XYM are your XYM and you don’t want someone taking back your funds. That’s one of the great things about crypto, your money is in your control. Well, for one block only at the Cyprus fork, the XYM mosaic was made revocable in order to recover funds from NxL accounts and redistribute them to the new treasury NCHEST3QRQS4JZGOO64TH7NFJ2A63YA7TPM5PXI and sink addresses. Because of this one off we need to add a check in the code showing the movement of those funds back to the issuing account before they are sent to the new addresses.

    elif tx_type == 17229:
        # Special case - during the Cyprus fork XYM mosaics were revoked from NxL accounts
        # And redistributed to the new treasury and sink addresses
        
        if height == 689761:
            mosaic = x['transaction']['mosaicId']      
            acc = str(base64.b32encode(x['transaction']['sourceAddress']).decode('utf8')[0:39])
            if (mosaic == 7777031834025731064):
                amount = x['transaction']['amount']/1000000
                # Add to recipient (the requester)
                try: 
                    historic[sender][height] += amount
                except:
                    historic[sender][height] = amount
                
                # Remove from sender
                try:
                    historic[acc][height] += -abs(amount)
                except:
                    historic[acc][height] = -abs(amount)

Step 6: validation

So, how do we even know that we did everything correctly? Well, fortunately, this is the easy part! 🥳 😁 We can just retrieve all current account balances from the accounts collection, and then compare these to our historical balance dictionary at the current block height. If we did everything correctly then the two should match and we can be confident that all of the previous block heights will also be correct. Note: there will be a small delay between getting the historical balances and getting the current balances so it may not be 100% accurate but as long as the two heights are not too different then it should be a damn good approximation!

In order to compare my computed balances with the actual current balances pulled directly from MongoDB and stored in df_balance (see Step 4) I am going to sum balances and fees for every account up to the current block (or the latest block that I retrieved in Steps 4 and 5).

predicted = {}
block = end
for account in historic:
    balance = 0
    balance_fees = 0
    
    for txs in historic[account]:
        if txs <= block:
            balance += float(historic[account][txs])
    predicted[account]=balance
    
    for txs in historic_fees[account]:
        if txs <= block:
            balance_fees += float(historic_fees[account][txs])
    predicted[account]+=balance_fees

Next add these values to a dataframe:

predicted_balances = pd.DataFrame([predicted])
predicted_balances = predicted_balances.transpose().reset_index()
predicted_balances.columns = ["Address", "Balance_predicted"]

And then merge with our actual account balances retrieved from the accounts collection in MongoDB:

new = pd.merge(df_balance, predicted_balances, on='Address')

Now we can plot all actual account balances against those that I computed from the genesis block:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()

plt.scatter(x=new['Balance'], y=new['Balance_predicted'], c='DarkBlue', s=1)
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlim([0, 10000000000])
ax.set_ylim([0, 10000000000])
plt.xlabel("Balance (XYM)")
plt.ylabel("Predicted Balance (XYM)")
plt.savefig('testing_historical_updated.png', dpi=800)
plt.show()

And if everything worked correctly we will see a perfect diagonal in our scatter plot meaning that the historical account balance changes have been successfully calculated from the genesis block 🤞 Drumroll… 🥁😬 Hmm… 🤦‍♂️ Nearly… 😭

So, I am out of time and I didn’t quite make it 😕. What I will say is that I almost succeeded in my task – a Pearson correlation coefficient of 0.999999999999959 ain’t bad 😂 There are a few outliers, mostly at the very low end of the scale (< 0.0001 XYM) which will be down to rounding errors – what’s a few tens of thousandths of XYM between friends?

But there are also a handful of accounts where there is a larger difference. Still not huge but there must be something that I still haven’t accounted for in my logic and maybe some of these differences are due to transactions occurring between grabbing the current account balances and calculating the balances from block 1. It is bugging me though, I was aiming for perfection here 😂

The histogram below shows the deviation between the calculated current balances and those calculated directly from transactions in MongoDB. The y-axis is on a log10 scale so you can see that the overwhelming majority of account balances > 100,000 are correct and that the spread of error is small. In fact, 137858/137915 (99.96%) are within 0.01 XYM of the actual amount. There are a total of 19 accounts that are out by more than 10 XYM and this upsets me 😁 Note: this only shows accounts that have held XYM mosaics at some point in time. Others accounts that have only ever held non-XYM mosaics are excluded.

Step 7: travelling back in time

Right, so now, after all that(!) we end up with two dictionaries: historic which stores all transactions of every account since the genesis block and historic_fees which stores all harvesting, fee and expired hash lock funds which have been disbursed to beneficiaries. Therefore, getting an account balance is simple just provide an address and a block height and it will return the balance of that account at that height.

Simple queries

Here’s some sample code to obtain the balance and beneficiary income for the Symbol Blog address at block 1,000,000.

account = "NCAY26LEBPOXM7NPCNV4HL4EH5WM6UJ5UUN4UGA"
block = 1000000 
balance = 0
balance_fees = 0
balance_hash = 0


for txs in historic[account]:
    if txs <= block:
        balance += historic[account][txs]
for txs in historic_fees[account]:
    if txs <= block:
        balance_fees += historic_fees[account][txs]
        
        
print("Cumulative harvesting income at block {0}: {1}".format(block, format(balance_fees, '.6f')))
print("Total balance at block {0} is {1} XYM".format(block, format(balance+balance_fees, '.6f')))

Running this shows that I had a balance of 10,475.71 XYM and had not yet harvested a block:

Cumulative harvesting income at block 1000000: 0.000000
Total balance at block 1000000 is 10475.707727 XYM

Running it at the current block height shows (correctly!) that I have a balance of 37,646.38 XYM and have earned more than 1,000 XYM through harvesting:

Cumulative harvesting income at block 1852396: 1057.486291
Total balance at block 1852396 is 37646.382491 XYM

Historical plots

Now that we have historical account balances we can start to do some visualisations. I covered a lot of plots in my previous article and I am really short on time to make the deadline for the article. Because of this (and the fact that this article is already so long!) I am not going to spend a lot of time on plots but I will show you briefly in the Jupyter Notebook, how to plot balance over time. The Binance XYM wallet is as good an account as any so let’s take NAJNXZGEEHD56G2KVXEKOECTMIGKMIZN6YXDSHI as our example. You can spruce it up, convert the block height to dates and plot other accounts if you like 😊

Play along at home

I know that the code is a mess but I am going to blame it on my tight schedule to get this published rather than my laziness and lack of Python knowledge 😁 If you want to try this at home then you can download my notebook:

Jupyter Notebook – “The Big Bang query

The wrap up

So I hope that by now I managed to get everything working and the article has been published, meaning someone is actually reading this! 😁 If not then I will return to it post-Christmas and put it out on the blog. Edit: I am publishing it, even though it is not quite perfect! And here’s a challenge:

If you can spot the issue and fix my code, make that diagonal fit and get all accounts within 0.01 XYM of the actual value then I will send you a 500 XYM reward – just ping me on Twitter @blog_symbol or @ninelivessymbol 😁

One rule – you are not allowed to add additional queries to MongoDB, it has to be calculated from the blocks and transactions collections. This might be a red herring but I think that the discrepancies are due to the hash lock section as the majority of failures are out by +/- 10 XYM but I went completely code-blind and didn’t manage to fix it in time. Of course, it could be due to you guys making some exotic transactions that I was not aware were possible – this is also quite likely 😁

It should be relatively easy to add new functionality, for example, to support other mosaics or even historical metadata and namespaces without too much of an issue. Really everything needs to be stored nicely in a database which will regularly and automatically update from the previous latest block. Even better, to have some kind of web interface built around it to allow you to query the data without having to set up your own node. Maybe this is something that will come in the future 🤔

Thanks again for reading to the end and thanks to Anthony for the discussions on hash locks 😊

Have a wonderful Christmas and New Year! 🎄🍻

Avatar photo
NineLives
admin@symbolblog.com

I'm a Symbol and NEM enthusiast and run this blog to try to grow awareness of the platform in the English-speaking world. If you have any Symbol news you would like me to report on or you have an article that you would like to publish then please let me know!

No Comments

Sorry, the comment form is closed at this time.