i've finally done with my little project to write a better binary encoder for events
it has one distinctive functional feature, tags that contain hex data are recognised and encoded as compact binary data, which is a fairly hefty saving in storage size for any event with e (event), p (pubkey) or a (naddr) tags, just over half as much data compared to the others, with very little cost in processing
it is the late evening so now i have it working, i want to test and benchmark it, but that will be a task for tomorrow
thanks to nostr:npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6 for alerting me to the fact that my use of the Gob encoder incurs a big cost in decoding, this decoder will be fast, maybe not *quite* as fast as the simple one he wrote but likely saves more than 30 bytes on every event, often 60 without any compression
also it also does not copy memory for string fields, simply "unsafe" snipping them out of the buffer, meaning that once the event has been re-coded to JSON the garbage collector will be able to free it, as it is memory utilisation is typically around 50Mb which has been a massive improvement since before i found the resource leak bugs it had before
ah yes, it also uses varints everywhere for field lengths, for tags and for content fields, this means that if there is under 128 characters, only 1 byte is used, varint encoding basically makes the 8th bit an overflow indicator, so it encodes up to 32kb of data with a 2 byte prefix and due to the size limitation of 500kb will never be more than 3 bytes long (this would permit 4Mb)
i have filed an issue about the question of how many tags should be permitted in events, and i honestly don't see how it would make any sense to make events with more than 256 events and each event with more than 256 fields, so the binary encoder only uses one byte to signify the length of the tags field and the number of fields in each tag, so i have suggested this be specified in the protocol to keep a sane cap on this field...
i guess if later it ever proves to be a problem i can change the tag field counter prefixes to a varint, then it won't be hard to add a varint encoding to that field, but i am highly skeptical it will ever be breached
tomorrow this gets deployed and benchmarked, i'm very keen to do the comparison, i will use an external repo so i can easily pull in fiatjaf's code to side by side it
what i wrote is stringently modular, each filed has its own reader and writer function, and there is a read buffer and a write buffer, and in almost all cases the encoder does copy operations directly from the source with no intermediate memory into the destination, and in the case of decoding, strings are not copied at all, except for those shorter binary encoded tags, and i doubt it can be made any faster except for by tweaking things with unsafe copy-free techniques beyond what i already have written, bespoke hexadecimal encoding that takes off safety checks due to the fixed format, and suchlike
i've not actually done much in the way of benchmarking my binary encoding work in the past, so it should be interesting
anyhow, that's me, GN y'all, the wild mleku now is disappeared
#deardiary #devstr #gn
Showing page 1 of
9 pages