23 Dec My journey with MongoDB 2: The Big Bang query
I wasn’t going to write another NEM / Symbol Advent Calendar article this year but I saw a single empty spot in calendar number two and thought I would come back for a second round. I am a glutton for punishment, or maybe I should say I like to challenge myself and learn new things 😁
Introduction
In this article, I will (hopefully) write about a little project I started a few days ago. I thought it would be straightforward but like with everything blockchain, it was more challenging than I thought. I am actually being very optimistic here and have started writing this before I have code that functions correctly 😬 The idea was to create a historical record of the Symbol blockchain from the genesis block so that you could query an address at any given block height and it would return the account balance. I thought that would be simple, just replay all transactions since the first block, store them in some kind of data structure and then retrieve the balance at a particular block height. Actually, after writing some code to do it I realised it wasn’t quite as easy as I first thought!
As in my previous article, I will be relying on MongoDB and Python and will try to explain my chaotic logic as I go through the code (the entire Jupyter Notebook is linked at the end of the post). At the time of writing this introduction, I still haven’t implemented my nemeses NEMeses that are linked harvesting accounts, namespace rental, hash locks, secret locks and secret proofs but fingers crossed I will get there before this article is due to go out! 😬
Note of caution: It will take around 30 minutes to run all of the code in the Jupyter Notebook and recreate historical balances so you’ll need to be patient! Also, I have a fast internet connection with a decent connection speed to my node so I guess it might take longer if your connection is slower.
Note of frustration: Half an hour may not seem too long but when you are trying to debug something that happens in 1 in ~10,000 accounts it feels like an eternity! 😂
Step 1 – the setup
As usual, I am going to import some Python modules before I get started.
from pymongo import MongoClient
import base64
from sshtunnel import SSHTunnelForwarder
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from binascii import hexlify, unhexlify
import base64
from symbolchain.facade.SymbolFacade import SymbolFacade
from symbolchain.CryptoTypes import PublicKey
from collections import defaultdict
Then we need to set up the MongoDB connection to the node.
MONGO_HOST = "symbolblog-testnet.com"
server = SSHTunnelForwarder(
(MONGO_HOST,22),
ssh_username='**********',
ssh_password='**********',
remote_bind_address=('172.20.0.2', 27017)
)
server.start()
client = MongoClient('localhost', server.local_bind_port)
print(server.local_bind_port)
print("connected")
Step 2 – those pesky linked accounts
Most Symbol accounts have a linked address that signs when they harvest a block. This means that when iterating through all blocks to obtain the signing account (harvester) we need to link the address back to the main account to add the rewards. I started off just getting the current linked account from the accounts
collection in Mongo but then realised that of course users can relink their accounts to a new address at any point in time. This means we need to have all historical account links in order to assign a block reward to the correct main account.
In order to do this I queried the transactions
collection in the catapult
database for transaction type 16716
which represents all account link transactions. I initially just stored a simple mapping of the linked address to the main account as a dictionary but… as it turns out some linked accounts were associated with several main accounts over the course of history which is almost certainly bad practice and definitely a pain in my ass – this took forever to find but it did improve my results in Step 6! 😁
db = client['catapult']
collection = db.transactions
facade = SymbolFacade('mainnet')
mapping = defaultdict(dict)
out = collection.find({'transaction.type': 16716})
for x in out:
if x['transaction']['linkAction'] == 1:
height = x['meta']['height']
main = str(facade.network.public_key_to_address(PublicKey((hexlify(x['transaction']['signerPublicKey']).decode('utf8')))))
link = str(facade.network.public_key_to_address(PublicKey((hexlify(x['transaction']['linkedPublicKey']).decode('utf8')))))
# Ergh - some link accounts are used by more than one main account at different blocks :(
# Need to include height :(
# Shame on these accounts! :D
mapping[link][height]=main
Step 3: Inflation
As you will know, in Symbol the block rewards reduce over time so in order to see how much a harvester earned for making a block we need to refer to the inflation config properties of Symbol’s mainnet. I have just downloaded this file from GitHub and reformatted it slightly so it is a simple two-column tab-separated file showing the block number and reward amount in XYM. Also, make sure that the first column starts at 1 not 2 (or tell the code later on to ignore block 1).
df_reward = pd.read_csv("block_rewards.tsv", sep='\t')
Step 4: Rewards and fees
Right, so now we can start peering into the past and iterating through all blocks since Symbol’s genesis and calculating fees and harvesting rewards for all accounts. As usual, I am going to store the data in a pandas dataframe for later use. Every record in the blocks
collection contains the total fees paid, the harvester and the beneficiary address. As I pointed out in Step 2, the harvester address is usually a linked account, not the user’s main address. Therefore we need to look up the signerPublicKey
(harvesting account address) in our dictionary linked to the main accounts we made earlier. Late addition: it seems that some accounts are using their linked address in their node beneficiary config too so added extra logic to see whether the beneficiary account is linked 😫 Later addition: some linked accounts are linked to more than one main account over time, added extra logic to loop through the account links and assign the correct main account for that block height 🤬 You guys are testing me in so many weird and wonderful ways. 😂
Next, we look up the reward at the current block height in our loop from our df_reward
dataframe and add this value to our total fees. Just a reminder that values are stored in microXYM and therefore have to be divided by 1,000,000 to give us the value in XYM.
OK, so now we have the total reward for a given block and next we have to work out the amount that is distributed to harvester (70%), node beneficiary (25%) and the network sink address (5%). Once we have these values we add the block height and reward for each to a dictionary where the key is the account address (one for each of the beneficiaries). Note: there are two harvest sink addresses to deal with, NBUTOBVT5JQDCV6UEPCPFHWWOAOPOCLA5AY5FLI
and the newer NAVORTEX3IPBAUWQBBI3I3BDIOS4AVHPZLCFC7Y
I believe that these switched over during the Cyprus fork at block 689,761.
facade = SymbolFacade('mainnet')
db = client['catapult']
collection = db.blocks
blocks = collection.find()
historic_fees = defaultdict(dict)
harvest_sinkV1 = "NBUTOBVT5JQDCV6UEPCPFHWWOAOPOCLA5AY5FLI"
sink = "NAVORTEX3IPBAUWQBBI3I3BDIOS4AVHPZLCFC7Y"
fork_height = 689761
f = open('block_data.tsv', 'w', encoding="utf-8")
f.write("Block\tTimestamp\tTotalFee\tFeeMultiplier\tHarvester\tBeneficiary\tReward\n")
i = 0
j = 0
for x in blocks:
height = int(x['block']['height'])
multiplier = x['block']['feeMultiplier']
fee = x['meta']['totalFee']/1000000
timestamp = x['block']['timestamp']
h = str(facade.network.public_key_to_address(PublicKey((hexlify(x['block']['signerPublicKey']).decode('utf8')))))
# This is a hassle.. There are linked accounts associated with multiple main accounts 🙄
if not mapping[h]:
harvester = h
else:
for keys in mapping[h]:
if keys <= height:
harv = mapping[h][keys]
harvester = harv
beneficiary = str(base64.b32encode((bytes(x['block']['beneficiaryAddress']))).decode('utf8')[0:39])
# Eek - some node fee beneficiaries are sending funds to their linked address :/
# Probably should use the same logic as above to avoid nasty surprises
if not mapping[beneficiary]:
beneficiary = beneficiary
else:
for keys in mapping[beneficiary]:
if keys <= height:
b = mapping[beneficiary][keys]
beneficiary = b
reward = (df_reward.iloc[df_reward[df_reward['Height'].le(height)].index[-1]]['Reward'])/1000000
f.write("{0}\t{1}\t{2}\t{3}\t{4}\t{5}\t{6}\n".format(height, timestamp, fee, multiplier, harvester, beneficiary, reward))
beneficiary_reward = (reward+fee) * 0.25
harvester_reward = (reward+fee) * 0.70
sink_reward = (reward+fee) * 0.05
# Harvesting and benefificary income
historic_fees[harvester][height]=harvester_reward
if height >= fork_height:
try:
historic_fees[sink][height] += sink_reward
except:
historic_fees[sink][height] = sink_reward
else:
try:
historic_fees[harvest_sinkV1][height] += sink_reward
except:
historic_fees[harvest_sinkV1][height] = sink_reward
try:
historic_fees[beneficiary][height] += beneficiary_reward
except:
historic_fees[beneficiary][height] = beneficiary_reward
if (j == 100000):
print("{0} blocks processed".format(i))
j = 0
i+=1
j+=1
f.close()
Again, we can read this in to a pandas dataframe:
df_block_data = pd.read_csv("block_data.tsv", sep='\t')
And for later use with namespace rental fees make a mapping of block height to fee multiplier
multiplier_mapping = dict(zip(df_block_data.Block, df_block_data.FeeMultiplier))
Before moving on to step 5. I just keep a count of the final block processed so we don’t exceed this in the next query:
end = int(df_block_data['Block'].iloc[-1])
And lastly, get the balance of all accounts from the accounts
collection which I will use to check if everything has worked correctly in Step 6.
db = client['catapult']
collection = db.accounts
facade = SymbolFacade('mainnet')
f = open('xym_addresses.tsv', 'w', encoding="utf-8")
f.write("Address\tBalance\n")
out = collection.find()
for x in out:
xym = 0
address = str(base64.b32encode(x['account']['address']).decode('utf8')[0:39])
mosaics = x['account']['mosaics']
for mos in mosaics:
if (mos['id'] == 7777031834025731064):
f.write('{0}\t{1}\n'.format(address, mos['amount']/1000000))
xym = 1
if xym == 0:
f.write('{0}\t{1}\n'.format(address, 0))
f.close()
df_balance = pd.read_csv("xym_addresses.tsv", sep='\t')
Transaction fees
As you may know, Symbol transaction fees are calculated based on the transaction size and the block feeMultiplier
which we obtained above. Just to enable fast lookup I have made a new dictionary where block height is the key and the feeMultiplier
is the value. Using this information we will be able to calculate the fee paid to send each of the transactions made on the network in Step 5.
multiplier_mapping = dict(zip(df_block_data.Block, df_block_data.FeeMultiplier))
Namespace and mosaic rental fees
Another complication which we will meet in Step 5 is that namespace and mosaic rental fees are calculated dynamically. In order to calculate the correct value we need to find the median feeMultiplier
over the previous 60 blocks and multiply this value by the duration of the namespace registration period in blocks, or in the case of mosaics multiply the value by 0.5. Here I am just making a list of feeMultiplier
values across all blocks for later use.
block_fee_mult = df_block_data['FeeMultiplier'].tolist()
Treasury reissuance
After the Cyprus hard fork, all of the old Symbol central funds were consolidated into one address NCHEST3QRQS4JZGOO64TH7NFJ2A63YA7TPM5PXI
and new sink addresses were set up to collect harvest and rental fees. Funds were reallocated by revoking and reissuing the mosaics at block 689,761.
Address aliases 😫
Another Symbol feature is that you can set up an account alias which allows others to send transactions to your account just using the linked namespace. For example, I have the namespace symbolblog linked to the account NCAY26LEBPOXM7NPCNV4HL4EH5WM6UJ5UUN4UGA
. This however causes further headaches in Step 5 as we need to resolve the alias into an address. Here is an example of a transaction where 1 XYM is sent to the account alias symbolblog and as you can see the recipientAddress
is padded with trailing zeros. Converting this to a Symbol address gives the address NHXVJOI33ETOJRIAAAAAAAAAAAAAAAAAAAAAAAA
and not NCAY26LEBPOXM7NPCNV4HL4EH5WM6UJ5UUN4UGA
.
{
"meta":{
"height":"885312",
"hash":"720F49BFA6061F003A3C80F76DE4C0FD18A99F42815E8C5CD8E0FBA122616F23",
"merkleComponentHash":"720F49BFA6061F003A3C80F76DE4C0FD18A99F42815E8C5CD8E0FBA122616F23",
"index":0,
"timestamp":"26666761725",
"feeMultiplier":10
},
"transaction":{
"size":207,
"signature":"8A7242E269615DE40951A4531C3251D774B9782626B9B3011BE04B8CCD217FCE32859A209D585BFC60584A175A040D6D13B2380B7D571C8D9046D683C42BAC0B",
"signerPublicKey":"9A4FEFD73331ABC604F4C71360D91B0A112ED13AE2E7EE152A7A22B9BF01FC02",
"version":1,
"network":104,
"type":16724,
"maxFee":"2070",
"deadline":"26673936709",
"recipientAddress":"69EF54B91BD926E4C5000000000000000000000000000000",
"message":"0054657374696E67206E616D657370616365206163636F756E74206C696E6B",
"mosaics":[
{
"id":"6BED913FA20223F8",
"amount":"1000000"
}
]
},
"id":"61E6DD8891BDF55E6E6E5E11"
}
So, another hurdle and another section of code needed to capture all of the aliases from the addressResolutionStatements
collection:
# Get account aliases!
db = client['catapult']
collection = db.addressResolutionStatements
facade = SymbolFacade('mainnet')
aliases = defaultdict(dict)
out = collection.find()
for x in out:
height = x['statement']['height']
unresolved = base64.b32encode((bytes(x['statement']['unresolved']))).decode('utf8')[0:39]
alias_address = base64.b32encode((bytes(x['statement']['unresolved']))).decode('utf8')[0:39]
e = 0
for entry in x['statement']['resolutionEntries']:
resolved_address = base64.b32encode((bytes(x['statement']['resolutionEntries'][e]['resolved']))).decode('utf8')[0:39]
aliases[alias_address][height]=resolved_address
e+=1
We will use the lookup table aliases
in step 5 to resolve any alias addresses transacting at a given block height.
Step 5: the nightmare begins 😭
Here we are going to get down to business and process all transactions since the genesis block. I will warn you now that my code is messy and at this moment in time it doesn’t work 😂 By the time this is published (if it is published) then it will all be fixed so don’t worry! Back to modifying my code… 🙏
OK, so Symbol has 25(!) different transaction types and each has its own particular format and set of values within MongoDB. This makes grabbing the correct data from each a bit complicated, especially when not all of the relevant data is stored within the transactions
collection. On the plus side, for most transaction types, we will only want to find the transaction size in order to calculate the fee paid by the account initiating the transaction since no mosaic transfers occur. Because of this, I can group the logic for 17 different transactions together and then implement a specific set of checks and calculations for the other 8 (hash lock, aggregate bonded, secret lock, secret proof, transfer, mosaic definition, namespace registration and the special case – mosaic revocation). At the Cyprus fork block, previous NxL funds were revoked and reissued and therefore we have to process mosaic revocation transactions to account for this special case. There are also pre-fork sink addresses for mosaic and namespace registration fees, these were superseded by the newer NCVORTEX4XD5IQASZQEHDWUXT33XBOTBMKFDCLI
address in block 689,761. I have learned a lot more about Symbol’s history during the course of writing this code and, just when I thought I had everything covered, another special case appeared 😁
I am going to split my code up into chunks explaining how the logic for each works. The full code to recreate everything in this article is available at the end of the post as a Jupyter Notebook.
Getting started
Before we go any further I need to specify the sink addresses for mosaic and namespace registration fees and the height of the Cyprus fork when the new combined sink address became active. I also give the balance of the genesis account NASYMBOLLK6FSL7GSEMQEAWN7VW55ZSZU25TBOA
before the distribution of funds in block 1.
We are connecting to the transactions
collection in MongoDB and will iterate through all 8+ million transactions. I am storing all transactions in a dictionary historic
this will store balance change events across all accounts and the key to the dictionary will be account address.
import statistics
facade = SymbolFacade('mainnet')
db = client['catapult']
collection = db.transactions
out = collection.find()
historic = defaultdict(dict)
i = 0
j = 0
height = 0
slocks = 0
hash_locks = defaultdict(dict)
secret_locks = defaultdict(dict)
sink = "NCVORTEX4XD5IQASZQEHDWUXT33XBOTBMKFDCLI"
mosaic_sinkV1 = "NC733XE7DF46Q7QYLIIZBBSCJN2BEEP5FQ6PAYA"
namespace_sinkV1 = "NBDTBUD6R32ZYJWDEWLJM4YMOX3OOILHGDUMTSA"
fork_height = 689761
historic["NASYMBOLLK6FSL7GSEMQEAWN7VW55ZSZU25TBOA"][1] = 7842928625
Next, I need to set up a function that will be required to calculate rental fees later in the code. It takes the block height as input and will calculate the median feeMultiplier over the previous 60 blocks. These data were stored previously through querying the blocks
collection (Step 4). Note: zero values are replaced with a fee multiplier of 100.
# Get average FeeMultiplier from last 60 blocks
def feeMultiplier(block_height):
block_height-=1
block_height_start = block_height-60
fee_multiplier = block_fee_mult[block_height_start:block_height]
fee_multiplier = [100 if item == 0 else item for item in fee_multiplier]
average = statistics.median(fee_multiplier)
return(average)
Now it is time to loop through all transactions. As you will see I have a list called same_logic
(I need more descriptive variable names). These each correspond to a transaction type in the Mongo transactions table and they all share the same property, they do not involve any funds transfer other than the transaction fee so essentially we don’t care about these.
We want to store the block height as well as information on the initiator of the transaction sender
which we derive from the public key of the signer, along with the transaction type and the fee multiplier for the block height. You will see that I specify that we break out of the loop if we reach the last entry in the blocks table that we retrieved in Step 4.
If there is a size
field for the transaction (not present in inner aggregates) then I calculate the fee paid and debit this amount from the sender. If the transaction is in same_logic
then I am done with this transaction. If not I process it any further.
same_logic = [16716, 16972, 16705, 16707, 16963, 16708, 16964, 17220, 16973, 16725, 16974, 17230, 16720, 16976, 17232, 16977, 16721]
for x in out:
fee = 0
height = int(x['meta']['height'])
if (height >= end):
break
sender = str(facade.network.public_key_to_address(PublicKey((hexlify(x['transaction']['signerPublicKey']).decode('utf8')))))
tx_type = int(x['transaction']['type'])
multiplier = multiplier_mapping[height]
try:
size = x['transaction']['size']
fee = (size * multiplier)/1000000
try:
historic[sender][height] += -abs(fee)
except:
historic[sender][height] = -abs(fee)
except:
pass
if tx_type in same_logic:
pass
Transfer transactions
These are perhaps the simplest of transactions. There is a sender and a recipient along with a mosaic ID and amount. Here I just find any transactions that involve the XYM mosaic, remove funds from the sender and add them to the recipient.
As usual, there is a caveat here, if a transaction is sent to an account alias then the address needs to be resolved back into the recipient’s address. I have achieved this is a bit of a hacky way just by checking for the presence of a string of “A”s in the address. Any accounts matching this pattern will then be checked in the aliases
dictionary we defined in Step 4 to confirm whether the suspected alias is present at that block height and then retrieve the resolved address.
elif tx_type == 16724: #TRANSFER
recipient = str(base64.b32encode(x['transaction']['recipientAddress']).decode('utf8')[0:39])
if "AAAAAAAAAAAAAAAAA" in recipient:
# Address is probably an alias
# Get resolution
try:
recipient = aliases[recipient][height]
except:
recipient = recipient
mosaics = x['transaction']['mosaics']
for mos in mosaics:
if ((mos['id'] == 7777031834025731064) | (mos['id'] == -1780160202445377554)):
amount = mos['amount']/1000000
# Add to recipient
try:
historic[recipient][height] += amount
except:
historic[recipient][height] = amount
# Remove from sender
try:
historic[sender][height] += -abs(amount)
except:
historic[sender][height] = -abs(amount)
Hashlocks
Hashlock transactions are required when sending an aggregate bonded transaction and act as a deposit which is refunded once the aggregate bonded transaction is signed by all parties and confirmed on the network. Sending a hash lock transaction “locks” 10 XYM in the initiator’s account which is then refunded if the aggregate bonded transaction completes before the transaction deadline. We can store the hash, expiry block and sender information:
elif tx_type == 16712: #HASH_LOCK
try:
duration = x['transaction']['duration']
lock_end = height + duration
lock_hash = hexlify(x['transaction']['hash']).decode('utf8')
hash_locks[lock_hash]['lock_end'] = lock_end
hash_locks[lock_hash]['sender'] = sender
# Pay 10 XYM deposit
try:
historic[sender][height] += -abs(10)
except:
historic[sender][height] = -abs(10)
except:
pass
We also need to check all aggregate bonded transactions for the lock hashes stored in lock_hash
. If it is found, we refund 10 XYM to the account that initiated the hash lock transaction. We then delete the hash lock from the hash_locks
dictionary as we know that it has been used.
elif tx_type == 16961: # Aggregate bonded transaction
hash_id = hexlify(x['meta']['hash']).decode('utf8')
if hash_id in hash_locks:
initiator = hash_locks[hash_id]['sender']
del hash_locks[hash_id]
try:
historic[initiator][height] += 10
except:
historic[initiator][height] = 10
After processing all transactions, those entries left in the hash_locks
dictionary will either be current or will have expired. We can loop through this dictionary and if the hash lock has expired then we can credit the funds to the account that harvested the block in which the hash lock expired.
for expired in hash_locks:
block = hash_locks[expired]['lock_end']
if block <= end:
# The harvester of the block where the hash lock tx failed gets the 10 XYM
harv_address = df_block_data.loc[df_block_data['Block'] == block, 'Harvester'].iloc[0]
if not mapping[harv_address]:
harvester = harv_address
else:
harvester = mapping[harv_address]
try:
historic_fees[harvester][block] += 10
except:
historic_fees[harvester][block] = 10
Secret locks and secret proofs
Symbol secret lock transactions allow users to send funds to another account but for the funds to be locked to both the sender and the recipient until a correct proof is announced using a secret proof transaction. We can model this by removing funds from the sender on receipt of the secret lock and crediting the funds to the recipient when the proof transaction is received.
Lock
Add secret lock information to the secret_locks
dictionary and remove the amount locked from the sender’s account.
elif tx_type == 16722: #SECRET_LOCK
if x['transaction']['mosaicId'] == 7777031834025731064:
amount = x['transaction']['amount']/1000000
recipient = str(base64.b32encode(x['transaction']['recipientAddress']).decode('utf8')[0:39])
secret = hexlify(x['transaction']['secret']).decode('utf8')
secret_duration = x['transaction']['duration']
expiry = secret_duration + height
secret_locks[slocks]['recipient'] = recipient
secret_locks[slocks]['sender'] = sender
secret_locks[slocks]['amount'] = amount
secret_locks[slocks]['secret'] = secret
secret_locks[slocks]['expiry'] = expiry
secret_locks[slocks]['lock_height'] = height
slocks+=1
try:
historic[sender][height] += -abs(amount)
except:
historic[sender][height] = -abs(amount)
Proof
When the proof is received we credit the recipient’s account and delete the transaction from our secret_locks
dictionary.
elif tx_type == 16978: #SECRET_PROOF
recipient = str(base64.b32encode(x['transaction']['recipientAddress']).decode('utf8')[0:39])
secret = hexlify(x['transaction']['secret']).decode('utf8')
lock_id = ""
for locks in secret_locks:
lsecret = secret_locks[locks]['secret']
lrecipient = secret_locks[locks]['recipient']
lsender = secret_locks[locks]['sender']
amount = secret_locks[locks]['amount']
if ((lsecret == secret) & (lrecipient == recipient)):
lock_id = locks
try:
historic[lrecipient][height] += amount
except:
historic[lrecipient][height] = amount
try:
del secret_locks[lock_id]
except:
pass
Crediting expired secret lock funds
Each secret lock transaction has a duration of blocks after which it will expire. If the lock expires in block height
+ duration
then we need to credit those funds back to the sender of the secret lock at the expiry block. Therefore, after all of the transactions have been processed any entries remaining in our dictionary secret_locks
will have either expired or are still valid but have not yet been unlocked. Iterating through this dictionary allows us to check if the lock is still current and if not, to credit the sender at the time of expiry.
for l in secret_locks:
sender = secret_locks[l]['sender']
amount = secret_locks[l]['amount']
expiry = secret_locks[l]['expiry']
lock_height = secret_locks[l]['lock_height']
if expiry < end:
try:
historic[sender][lock_height] += amount
except:
historic[sender][lock_height] = amount
Mosaic definition
All mosaics created on the NEM blockchain were ported over to Symbol in the genesis block and no fee was associated with their creation (hence I am excluding mosaic definition transactions at block height 1). From then on a registration fee was required when creating a new mosaic. This fee is equivalent to 0.5 XYM multiplied by the median fee multiplier in the previous 60 blocks. As explained above, all mosaic creation fees generated are credited to the mosaic sink address, of which there are two, one pre- and one post-fork. Here we just need to send the fees to the correct sink address and deduct the rental fee from the sender.
elif tx_type == 16717: #MOSAIC_DEFINITION
if height > 1:
average_fee_mulitplier = feeMultiplier(height)
rental_fee = average_fee_mulitplier * 0.5
try:
if height >= fork_height:
historic[sink][height] += rental_fee
else:
historic[mosaic_sinkV1][height] += rental_fee
except:
if height >= fork_height:
historic[sink][height] = rental_fee
else:
historic[mosaic_sinkV1][height] = rental_fee
try:
historic[sender][height] += -abs(rental_fee)
except:
historic[sender][height] = -abs(rental_fee)
Namespace registration
Much the same as the mosaic registration above, fees are associated with creating a new namespace. Again those that are moved over from NEM at block height 1 are free but for all later transactions, we need to calculate the appropriate fee. Again there are two sink addresses, one pre- and one post-fork and we have to calculate the dynamic fee multiplier.
In this case, the cost of registering a new namespace is dependent on the length of registration. If you have ever registered a domain name you will know that you pay by the length of the registration period (i.e. two years costs double the price of one year) and that at the end of the registration period the domain will expire if it isn’t renewed. The exact same principle applies in Symbol and rather than being charged by the year, you pay per block. The formula for calculating the fee is straightforward:
(average_fee_mulitplier * 2 * duration)/1000000
And you can register child namespaces at a reduced rate of:
0.1 XYM * average_fee_mulitplier
elif tx_type == 16718: #NAMESPACE_REGISTRATION
if height > 1:
try:
duration = x['transaction']['duration']
average_fee_mulitplier = feeMultiplier(height)
rental_fee = (average_fee_mulitplier * 2 * duration)/1000000
try:
historic[sender][height] += -abs(rental_fee)
except:
historic[sender][height] = -abs(rental_fee)
try:
if height >= fork_height:
historic[sink][height] += rental_fee
else:
historic[namespace_sinkV1][height] += rental_fee
except:
if height >= fork_height:
historic[sink][height] = rental_fee
else:
historic[namespace_sinkV1][height] = rental_fee
except:
average_fee_mulitplier = feeMultiplier(height)
rental_fee = 0.1 * average_fee_mulitplier
try:
historic[sender][height] += -abs(rental_fee)
except:
historic[sender][height] = -abs(rental_fee)
try:
if height >= fork_height:
historic[sink][height] += rental_fee
else:
historic[namespace_sinkV1][height] += rental_fee
except:
if height >= fork_height:
historic[sink][height] = rental_fee
else:
historic[namespace_sinkV1][height] = rental_fee
Once the fee has been calculated we subtract the relevant amount from the sender and credit the funds to the relevant sink address.
Mosaic revocation
This is an odd one. Usually, XYM (mosaic 6BED913FA20223F8
) are not revocable, this makes sense as your XYM are your XYM and you don’t want someone taking back your funds. That’s one of the great things about crypto, your money is in your control. Well, for one block only at the Cyprus fork, the XYM mosaic was made revocable in order to recover funds from NxL accounts and redistribute them to the new treasury NCHEST3QRQS4JZGOO64TH7NFJ2A63YA7TPM5PXI
and sink addresses. Because of this one off we need to add a check in the code showing the movement of those funds back to the issuing account before they are sent to the new addresses.
elif tx_type == 17229:
# Special case - during the Cyprus fork XYM mosaics were revoked from NxL accounts
# And redistributed to the new treasury and sink addresses
if height == 689761:
mosaic = x['transaction']['mosaicId']
acc = str(base64.b32encode(x['transaction']['sourceAddress']).decode('utf8')[0:39])
if (mosaic == 7777031834025731064):
amount = x['transaction']['amount']/1000000
# Add to recipient (the requester)
try:
historic[sender][height] += amount
except:
historic[sender][height] = amount
# Remove from sender
try:
historic[acc][height] += -abs(amount)
except:
historic[acc][height] = -abs(amount)
Step 6: validation
So, how do we even know that we did everything correctly? Well, fortunately, this is the easy part! 🥳 😁 We can just retrieve all current account balances from the accounts
collection, and then compare these to our historical balance dictionary at the current block height. If we did everything correctly then the two should match and we can be confident that all of the previous block heights will also be correct. Note: there will be a small delay between getting the historical balances and getting the current balances so it may not be 100% accurate but as long as the two heights are not too different then it should be a damn good approximation!
In order to compare my computed balances with the actual current balances pulled directly from MongoDB and stored in df_balance
(see Step 4) I am going to sum balances and fees for every account up to the current block (or the latest block that I retrieved in Steps 4 and 5).
predicted = {}
block = end
for account in historic:
balance = 0
balance_fees = 0
for txs in historic[account]:
if txs <= block:
balance += float(historic[account][txs])
predicted[account]=balance
for txs in historic_fees[account]:
if txs <= block:
balance_fees += float(historic_fees[account][txs])
predicted[account]+=balance_fees
Next add these values to a dataframe:
predicted_balances = pd.DataFrame([predicted])
predicted_balances = predicted_balances.transpose().reset_index()
predicted_balances.columns = ["Address", "Balance_predicted"]
And then merge with our actual account balances retrieved from the accounts
collection in MongoDB:
new = pd.merge(df_balance, predicted_balances, on='Address')
Now we can plot all actual account balances against those that I computed from the genesis block:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plt.scatter(x=new['Balance'], y=new['Balance_predicted'], c='DarkBlue', s=1)
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlim([0, 10000000000])
ax.set_ylim([0, 10000000000])
plt.xlabel("Balance (XYM)")
plt.ylabel("Predicted Balance (XYM)")
plt.savefig('testing_historical_updated.png', dpi=800)
plt.show()
And if everything worked correctly we will see a perfect diagonal in our scatter plot meaning that the historical account balance changes have been successfully calculated from the genesis block 🤞 Drumroll… 🥁😬 Hmm… 🤦♂️ Nearly… 😭
So, I am out of time and I didn’t quite make it 😕. What I will say is that I almost succeeded in my task – a Pearson correlation coefficient of 0.999999999999959 ain’t bad 😂 There are a few outliers, mostly at the very low end of the scale (< 0.0001 XYM) which will be down to rounding errors – what’s a few tens of thousandths of XYM between friends?
But there are also a handful of accounts where there is a larger difference. Still not huge but there must be something that I still haven’t accounted for in my logic and maybe some of these differences are due to transactions occurring between grabbing the current account balances and calculating the balances from block 1. It is bugging me though, I was aiming for perfection here 😂
The histogram below shows the deviation between the calculated current balances and those calculated directly from transactions in MongoDB. The y-axis is on a log10 scale so you can see that the overwhelming majority of account balances > 100,000 are correct and that the spread of error is small. In fact, 137858/137915 (99.96%) are within 0.01 XYM of the actual amount. There are a total of 19 accounts that are out by more than 10 XYM and this upsets me 😁 Note: this only shows accounts that have held XYM mosaics at some point in time. Others accounts that have only ever held non-XYM mosaics are excluded.
Step 7: travelling back in time
Right, so now, after all that(!) we end up with two dictionaries: historic
which stores all transactions of every account since the genesis block and historic_fees
which stores all harvesting, fee and expired hash lock funds which have been disbursed to beneficiaries. Therefore, getting an account balance is simple just provide an address and a block height and it will return the balance of that account at that height.
Simple queries
Here’s some sample code to obtain the balance and beneficiary income for the Symbol Blog address at block 1,000,000.
account = "NCAY26LEBPOXM7NPCNV4HL4EH5WM6UJ5UUN4UGA"
block = 1000000
balance = 0
balance_fees = 0
balance_hash = 0
for txs in historic[account]:
if txs <= block:
balance += historic[account][txs]
for txs in historic_fees[account]:
if txs <= block:
balance_fees += historic_fees[account][txs]
print("Cumulative harvesting income at block {0}: {1}".format(block, format(balance_fees, '.6f')))
print("Total balance at block {0} is {1} XYM".format(block, format(balance+balance_fees, '.6f')))
Running this shows that I had a balance of 10,475.71 XYM
and had not yet harvested a block:
Cumulative harvesting income at block 1000000: 0.000000
Total balance at block 1000000 is 10475.707727 XYM
Running it at the current block height shows (correctly!) that I have a balance of 37,646.38
XYM
and have earned more than 1,000 XYM through harvesting:
Cumulative harvesting income at block 1852396: 1057.486291
Total balance at block 1852396 is 37646.382491 XYM
Historical plots
Now that we have historical account balances we can start to do some visualisations. I covered a lot of plots in my previous article and I am really short on time to make the deadline for the article. Because of this (and the fact that this article is already so long!) I am not going to spend a lot of time on plots but I will show you briefly in the Jupyter Notebook, how to plot balance over time. The Binance XYM wallet is as good an account as any so let’s take NAJNXZGEEHD56G2KVXEKOECTMIGKMIZN6YXDSHI
as our example. You can spruce it up, convert the block height to dates and plot other accounts if you like 😊
Play along at home
I know that the code is a mess but I am going to blame it on my tight schedule to get this published rather than my laziness and lack of Python knowledge 😁 If you want to try this at home then you can download my notebook:
Jupyter Notebook – “The Big Bang query“
The wrap up
So I hope that by now I managed to get everything working and the article has been published, meaning someone is actually reading this! 😁 If not then I will return to it post-Christmas and put it out on the blog. Edit: I am publishing it, even though it is not quite perfect! And here’s a challenge:
If you can spot the issue and fix my code, make that diagonal fit and get all accounts within 0.01 XYM of the actual value then I will send you a 500 XYM reward – just ping me on Twitter @blog_symbol or @ninelivessymbol 😁
One rule – you are not allowed to add additional queries to MongoDB, it has to be calculated from the blocks
and transactions
collections. This might be a red herring but I think that the discrepancies are due to the hash lock section as the majority of failures are out by +/- 10 XYM but I went completely code-blind and didn’t manage to fix it in time. Of course, it could be due to you guys making some exotic transactions that I was not aware were possible – this is also quite likely 😁
It should be relatively easy to add new functionality, for example, to support other mosaics or even historical metadata and namespaces without too much of an issue. Really everything needs to be stored nicely in a database which will regularly and automatically update from the previous latest block. Even better, to have some kind of web interface built around it to allow you to query the data without having to set up your own node. Maybe this is something that will come in the future 🤔
Thanks again for reading to the end and thanks to Anthony for the discussions on hash locks 😊
Have a wonderful Christmas and New Year! 🎄🍻
I’m a Symbol and NEM enthusiast and run this blog to try to grow awareness of the platform in the English-speaking world. If you have any Symbol news you would like me to report on or you have an article that you would like to publish then please let me know!
Sorry, the comment form is closed at this time.