#realy #devstr #progressreport
so, now i am up to the stage of actually writing a search function, i realise that the fulltext indexes need a lot more data in them
the fulltext index will have all this :
[ 15 ][ word ][ 32 bytes eventid.T ][ 32 bytes pubkey ][ 8 bytes timestamp.T ][ 2 bytes kind ][ 4 bytes sequence number of word in text ][ 8 bytes Serial ]
with this as the result from the word query, i can now then filter out by timestamps, pubkeys and kind immediately upon getting them, and with the sequence number in the text i can quickly determine sequence similarity by the words being in query order and then compared by the number of them that are in the same sequence
with all of these extra fields in the keys, almost all of the necessary filtering required for a fulltext search can be done effectively, and the amount of data in the index is not so great but the main thing is that the word matches can be found immediately, filtered by their timestamp and kind, grouped by their common presence in an event, and then sorted in their sequence order within the event.
it is a lot of data but all of these fields are needed to evaluate and without them being in one index i would make a lot more queries and decoding to find them, and this trades off some extra size in the database for speed at finding, filtering and sorting matches
gonna need to make a specific data structure for it, well, probably a set of functions that pick out the field and decode it, that would be more efficient since i can then quickly jump to the field that needs to be filtered according to the other, non-fulltext part of the search, eg, the kinds, the pubkey, the timestamp range... not really exactly sure which of those the process should be ordered in... probably pubkey, kind, timestamp, in that order, depending on the presence of those in the filter
this may take longer than i hoped but i honestly didn't really have any idea what would be involved... at day 3 working on this now, and i only have the index writing done, but with this scheme designed i should progress quickly to a working search function
#realy #devstr #progressreport
so, i had really written the full scanning thing wrong, didn't need to be keeping track of the events that were already indexed...
they now index when being saved, and because this disrupts the import function, it now buffers the upload in a temp file and then lets the http client finish and disconnect, as it was stalling due to the extra delay from the time it takes to build the indexes
further, the rescan admin function now also regenerates these indexes as well, so, just gotta eat me some lunch, then i need to revise the delete/replace code to remove the indexes when events are removed, and finally get to the last step of the task, implementing the search
in other news, my double vision/convergence problem at close distance is definitely reducing slowly, i can now actually read small text but it's just a bit doubled if i am looking at text more less than about 50cm from my eyes. the glasses help me read easier but i feel like it's just a matter of continuing the potassium/calcium/magnesium and probably continuing to dial back the alcohol consumption to basically none. i think my kidneys are still weak and the alcohol is slowing the recovery. and incidentally i'm getting a lot more work done.
#realy #devstr #progressreport
so, i finally have some kind of indexing happening
it's quite incredible how big these full text indexes are, just on the face of the size of the database directory it looks like it takes more space for the index than the data
i probably need to try and put some kind of a further limit on how many symbols get an inverted index perhaps, i mean, gotta be some of those records are literally the whole event store for virtually every note because it has the word "the" in it, but how do i do this correctly without getting the problem of either making giant lists of prepositions in every language...
well, i'll see i guess
i mean, it's kinda worth it if i can implement fast full text searching of notes, even if the storage space is greatly increased... it just is a lot of space
anyhow
in the process of doing this i ran into a nasty thing in badger where it just kills the process completely, without any feedback. i had to use a go workspace, cloned the badger repo into it and then i was able to insert a panic message and actually get a traceback, and eventually figured out i was using iterators for something that just needed the use of a simple Get call to see if the literal exact key existed.
i've now drafted the full text indexer, it uses unicode to split up the words, eliminates the duplicates, and then feeds that to a background thread that updates the indexes for each word to add the database sequence record for the event to the list of them that contain each word
it's too late in the night for me to add the search function tho, but i will continue this work tomorrow, it's much more easy for me to do this than the bunker signer at this point, because i'm new to working with teh GUI i'm using for that, and i had to refresh my memory about how to encrypt keys and all the bits associated with that.
not gonna even test the indexer tonight, i'm poopered, but i will test it tomorrow afternoon after my fiat mine work, to make sure it isn't bombing out with a panic or anything, and then to work on the actual search function
the search function is a bit complicated... first you have to take the search term, break it down into words, and find all the records and get all of their lists of database sequence numbers, and then out of that, first, assemble a progressive list of the events that come up in all of them, and then one less, and one less and one less until there is no search terms left
then you go through those lists, and you scan the content field for the locations of the text matches, and find the ones that have the longest sequences of matches in the same order as the search request text, to sort them by, and then you return the events in descending order of exact matching, and all the events that have several of the terms but not in order after that
and then the task will be complete, and #realy will have a full text search capability with sort by relevance (relevance meaning how closely the result event matches the search terms).
making the index is the easy part. finding the matches and ranking them will be quite a long function i expect. the indexer is only 117 lines of code.
i could be wrong, but i think that if #realy has a full text search capability it will be the first nostr relay that has this built into it, instead of being a ghey DVM. the nip has existed for a long time, https://github.com/nostr-protocol/nips/blob/master/50.md
i never implemented it before though it was always on my mind, because getting realy into a fully working nip-01 compliant state was quite a lot of effort because of the shitty code i based it on, which i basically completely rewrote, almost 90% of it i'd estimate.
this will also be good for my efforts to sell the idea to my boss of shifting our data storage systems for chat and forums over to a nostr base, and because i made a HTTP API already, i only have to explain the event format to them, not the whole retarded nip-01 "API" that dare not speak that name (because it's a shit API, in so many ways)
it could well be that in the end, i'm going to be the one out of all the nostr devs who actually successfully builds nostr into a for real commercial protocol, because building a replacement for slack and gsuite has been on his mind since the beginning, i think most of what my boss is doing is trying to get funding for something bigger, so if we have already integrated chat and forum tech with nostr events, it's only a small step to include documentation, i have already now also had experience with working with international time and schedules, we can have calendar, messaging, chat, documents, i mean what more do you need out of a remote collab or just plain business data, comms and scheduling system than those?
#realy #bunk #devstr #progressreport
so, fyne is a bit buggy in places, you have to sometimes force it to refresh and stuff when you do things in a sequence it doesn't expect, like dragging the window to another screen it sometimes resizes the window and widgets overflow and you have to resize it to get them to fit the window again
i think i've got an inkling about how to avoid these glitches and adding some more calls to the function that repaints the window, so i will make it behave like a good little GUI app given time.
i've now learned how to make it render text to the set theme (which defaults to whatever your desktop manager says it is, which is nice) and so far this is what it does:
- open start screen with nsec input and one relay input that you paste in your nsec and type in the relay you want to use
- then you can click on the "next" button and it flips the display to the main display which shows the text of the bunker URL nicely wrapped to stay a decent size, and a button that you click and it puts the bunker URL into the clipboard
really basic steps so far, but as i get more familiar with how to work with it and the quirks i need to handle, i think in a few days i will have a working signer bunker app at least for linux, but i think actually i can build apps to push to android and ios as well, and probably can build windows and mac versions as well, which will cover all bases.
idk about properly covering ios and mac platforms but i can test it on windows at least as well
should be usable soon, once i learn all the tricks how to make it behave itself, fortunately the demo app has a lot of examples which allowed me to get this far, and i'm sure will help me break through until the MVP.
and yes, i'm calling it "bunk" because i have an ironic sense of humor, why else did i call my relay "realy" a typo that i often make... that would be because i know i'm a noob at a lot of things and when i am, i make fun of myself. proof of non-psycho.
#realy #devstr #progressreport
yesterday i spent a lot of time writing a novel (i think - it's the reverse of regular variable length integer encodings in that it uses the 8th bit as a terminal marker instead of one on each byte to indicate "there's more") variable integer encoder to write a binary encoding for use in the database that is written using the #golang io.Writer/io.Reader interface (so it can be used to stream the binary data over wire or to disk)
i figure the way the varint encoder works it's just a little faster because it only adds the 8th bit to the last byte so that's potentially as much as 1/7th of this operation, not huge but ok, anyway it was required for the streaming read/decode anyway, to make a new one.
it's not on by default, you have to set BINARY in the environment to make it do this, and there is no handling for this - if binary false, it tries to decode json, and if true, binary, so to change over, you need to export the events, nuke or just delete the database files ( ~/.local/share/realy ) and flip it to true and then import the events back (and if you deleted, you want to re-set your configuration, you really should keep a copy of the configuration json that gets stored in the database for reasons like this).
it's definitely noticably faster, the binary encoder i am quite sure will prove to be the fastest nostr database binary encoder, and it's also extremely simple, and the results are about a 30% reduction in event storage size, and maybe 1 microsecond to perform the encoding.
#realy #devstr #progressreport
i have got a new trick now for when i make an app that rarely does very memory/disk intensive operations like importing events, or my fiat mine job, doing an (N-1)N number of operations involving comparing all of a collection of JSON to each other (this blew up the memory on a heroku instance to just over 1gb and their server kills the process and calls it crashed
there is a manual trigger function in the "debug" library called debug.FreeOSMemory() which does this
so i just run that frequently through import and export, and voila, i can disable the swap on my VPS and realy doesn't blow up the memory.
in slightly unrelated matters, i learned about zigzag matrix traversal, and just now found this:
https://algocademy.com/blog/matrix-traversal-mastering-spiral-diagonal-and-zigzag-patterns/
spiral, diagonal and zigzag are three common methods that are used to optimize iterations on specific types of operation
in my fiat mine comparison matrix, the optimization is about reducing the number of decode operations on the JSON of the entries at each point in the matrix. the optimization that helps here is where the the path is such that you can cache some number of pre-decoded values from each of the elements of the array (which is being compared exhaustively)
spiral wouldn't do the trick, because it only saves you decoding one again for each circuit
diagonal would be no different from left-right-top-bottom path (scanline pattern)
zigzag is like diagonal except instead of starting again from the the same side, you move to the next non-visited node and then take a reverse diagonal
with zigzag iteration i will be able to tune the number of decoded elements in each comparison to reduce the number of times the data has to be marshaled again to be compared
i will likely make it a divide and conquer multi level zigzag too depending on the number of entries that need to have this comparison, that will traverse in a scanline pattern because that means that all of one of the segments in the grid i cut the matrix into will stay hot the whole time it passes left-right-top-bottom simple iteration
i am aiming to have a comparison algorithm that can efficiently and quickly render the results of such a comparison of all elements with each other in an array, to 100k and beyond, without needing tens of gigabytes of memory to do the job
it will also use a sync.Pool for this purpose, that keeps a limited size cache of the decoded elements, and when the cache reaches that size it will discard some percentage of the least recently used elements
probably i can concurrently run it too, so like zigzag of sub-matrixes left to right, if i can use 4 cores then i could run a grid in 4 sections and zigzag each section in parallel and then move down to the next row and so on until the computation is complete
Showing page 1 of
13 pages