this pubkey-index compression scheme i cooked up has turned quite complicated i noticed that i had written a whole fresh new canonical event codec out of a new json parsing API i built a little while back as part of my beginning work on implementing an #NWC SDK for #golang this codec expects the pubkey field of the event to be in hex or it won't parse, so i can't substitute base 10 in there or it will mangle my index value... i have to do it with hex, and the only way this is possible with an unsigned 64 bit integer in the Go stdlib is using ParseUint/FormatUint - the conversions done by the fmt library, for instance, don't like that extra bit difference between int64 and uint64 (tested with the math.MaxUint64) haha i was already going to use these functions, just with base 10 but then i realised, oh, they can be base 16 and then the hex decoder will return a compactly encoded binary value from it, max length of the index field is then 16 characters in the canonical encoding, still a lot better than 64, and the first 4 billion npubs in teh database only need max 8 characters, and because of how these functions work, it can be odd numbers of hex characters, which will break the hex codec turning them into raw bytes as expected in the pubkey field so i also have to pop a zero in front of odd length strings (on the left) this is ok though, whatever has to be done i guess, i don't want to redo the codec and this repurposes its encoding to work with shorter hex strings which are valid but in this compact indexed encoding scheme represent a monotonic serial number, instead of raw bytes argh... all because of expecting hex in the pubkey field but then hex is better than decimal anyway

1
0
3

extra fun stuff in this also... the pubkey field is unpacked as binary, whereas the tag values are strings, so they have to be handled differently, one will be up to 8 bytes that must be left-padded with zero bytes and then converted to uint64 and the tag value fields have to be decoded as hex-ascii encoding still, those two separate fields have to be handled differently - they just don't have to be handled differently in the encode, only in the decode, because the event struct i defined keeps the pubkeys in memory as raw binary, which makes scanning them for a match 2x as fast (half as many bytes to iterate) i did at one point try to make the pubkeys in p tags get flattened to binary in the encoding but that whole binary database encoding thing i gave up on it, it was easier to just use the relatively compact canonical format and add compression to the database options (the db is typically around half the size of the raw event JSON in wire encoding format) yeah, that problem with the binary encoding and its performance versus data size... lol... my json encoder was faster than generated protobuf, faster than easyjson, and probably in total the database is still probably faster even with the compression done over top, i think that it does compression on new data only after it's written to the log anyway, when it compacts the key and value logs, not when it writes it, so the performance hit is not in the write side so much (unless you pile a shit-ton of data into it at once) but the decompression on the read side for compacted log blocks impacts the reading out of the value - the keys are not compressed, afaik, would not make sense - so it still finds the records just as fast, but has to perform the zst decompression on the data this code will add an extra step and more memory handling but will achieve a far greater compression rate for follow and mute list events, compressing pubkeys to 1/4 of their size, *at most* is impossible with such high entropy data... the data means nothing without the provided substitutions though, stored in the keys that hold the pubkey/serial index this scheme depends on pubkeys appearing more than once in people's follow lists, which they will, for the most part, even just two follows on one npub is a 25% data size saving

1
0
3

1
0
3

0
0
3

Showing page 1 of 1 pages