1
0
3

brave AI sez: For example, a 32-bit integer 0x12345678 would be stored as 12 34 56 78 in a big-endian system and as 78 56 34 12 in a little-endian system. in the same way, abc should come before def in a lexicographic sort this means the left-most byte of the number in binary form needs to be the biggest value, in order for it to sort in ascending order if it is treated the same way as if the bytes represented words. this is big endian. big endian is arabic right to left, same as how we write numbers. 001 is before 002, and so on. i'm a little surprised to be learning this now, to be honest. it's quite salient information about how to use numbers in a key-value store. LSM key value stores have "seek" functions that treat the entire key as a... big endian number. you give it a prefix, that means the LEFT (most significant) value is the sort order which means to sort in ascending order you must use big endian. otherwise it would be back to front, and "ascending" order would be going in reverse. not only that, if you look at that little text above, even this backwards is not just backwards, each 8 bits is already in this order in the binary representation, but has to be unpacked in reverse order to make it into a text form. at least, if you did it that way, more often this operation is done by dividing the number by 10 over and over and then placing the digits in reverse order that you get in the dividend and the modulus (remainder) is then operated on until it becomes zero. this is in fact done usually with base 1000 and a lookup table. i did one with a 10000 lookup table and it was faster, like almost twice as fast as the standard system integer-to-text. also, just to note, a byte-wise comparison can shortcut evaluate which of two numbers is less going forwards from byte zero upwards, because the first digit that isn't zeroes will tell you if the number is bigger or smaller, eg 001 is less than 010, and you can determine that by reading the digits from left to right. i'm sure this is a bug i made for myself in my previous version of the database. endianness is the rule about what sequence you send "words" in binary over a network or serial device, or for that matter, bus on a motherboard. a word is your "processor register size" usually, which these days is 64. so, little endian is unsuited not only to lexicographical sorting it's also unsuited to stream decoding. stream decoding has been the main strategy i've been using with my codecs in recent weeks. my variable integer encoding is back to front for this, the values can't be lexicographically sorted. they can be read forward but they don't sort. this all came up because i am working on the "created_at" index now and i had set it to use the varints but that's not suitable for the task. it is fine for the serials to be compactly encoded like this, but almost all the fields have to be fixed length for sorting. the only exception is the word indexes because the length prefix is integral to the query. but i'll get to that later i guess. lol. i've probably messed that one up too. oh well. the varints are fine for the indexes even in the full-index which has the prefix and then the varint, because that's what you are searching for and that's its prefix. but it's probably not optimal for its sort order. so i probably need to revise the full-index (has id, pubkey, kind and created_at in one key) so it sorts properly and is able to be seeked with a bisection scan. i can keep the serials that come at the end as varints but in this case i think they should also be stored as full 64 bit numbers.

1
0
3

0
0
3

Showing page 1 of 1 pages