Nostr View

Well, okay, stop whining. Just built a fantastic #realy installer for normies and it's the relay behind the new version of our homepage. Spent three days working on it. 😂 Do you want me to PR it, to your repo, or keep it in gitrepublic-web?

David @mleku - 3mo

#realy #devstr #progressreport tonight i learned that only bigendian numbers sort lexicographically. *facepalm* see, i wrote the created_at indexes on realy (previous version) as little endian. they don't sort lexicographically.

David @mleku - 3mo

#realy #devstr #progressreport i'm having a bit of a struggle with the massive amount of code i wrote to implement the database on my new total rewrite of realy's database engine. i've managed to implement a fully functional search for kinds now, including a sorted and size limited count of results, but i have a dozen other functions to implement and it's quite daunting. i'm also kinda struggling with envisioning how i can use this to implement a serious business project. i know once i complete the task, it will be a fast and useful relay implementation, augmented with a bunch of extra indexes that can do searches that currently no other relay can provide. but the complexity. aaargh. just need to plug on tho. having the tech built will lead to a place where i can focus on business use cases more than having my brain flooded with complexity. for now, i just need to break down the complexity and get through these one at a time. there is more than a dozen different search methods, and they break down into three main parts, the time windows, the combinations of fields they search on, and the inherent separation of the main metadata fields and the tags, which are a whole set of factors. i'm implementing them as a search-intersection-sort-trim process. gonna dig into it though. on a side topic, i have realised that i need to set clearer limits on my consumption of alcohol. i had previously had a routine of throwing 350ml of vodka into a mix but i've toned that down to 175ml and i feel much better after that much only, like i can maybe even still function the rest of the day, and probably it will greatly reduce the problems from dehydration that i've been getting. i have had to rule wine, ciders and beers out of my routine altogether, and i think at this limit of daily consumption i may actually be able to get work done while feeling a lot more chilled. probably i'd be better off taking 5mg of diazepam than this. but a general pattern is that i've been reacting to do today, what is consequential to yesterday, and overdoing it and getting allergic and kidney stress from those three now forbidden forms of alcohol, will help a lot. when i calculate that 175ml of Absolut and a litre of milk works out to only 6 euros a day, this is a major improvement in my capital burn rate, and if it also improves my total hours possible of work, then it's a total win. it's easy for me to get frustrated at all this because a gram of hash a day works out to the same cost (as per amsterdam hash prices) but with none of the bad consequences. but i can't get a bead on that and haven't got the will to do the networking to chase down the supply, i simply know it's out there, which is a tease. stopping completely is not an option, unfortunately. it takes a week for all the biochemistry to settle and in the meantime i'm useless, depressed and weak. moderation seems to work better. time to finish debugging all these search functions.

David @mleku - 3mo

#realy #devstr #progressreport after cleaning up this nilsimsa hash function i decided that i'm going to add a new index. it's a little like a bayesian distance number, but it will be precalculated when saving events that are defined as text. it would, for example, be useful to detect plagiarism like the yoda speak bot that is currently doing the rounds on the spam infested free relays, but i think it would also have an interesting use in searching for recommendations of follows to look at. computing a whole graph of comparison vectors would be pretty insanely expensive compute but probably it could be useful to augment WoT graph calculations by generating a similarity score between the posts of two different users. it could also discover the breadth of a user's content by a simple XOR of all of their posts' nilsimsa hashes, the more varied their stuff, the more bits would be left not zeroed. this could then be used as part of a recommendation system as well in that the wider the variance of a user's content, the more interesting it is probably. it would pick up a lot also on the vocabulary of the user, more vocabulary would also tend to produce a higher score of all of their posts XORed together. a time series of such values could also be an interesting data set to look at as well, creating a set of interstitial comparisons over time and then evaluating the time series in some interesting ways.

David @mleku - 3mo

#realy #devstr #progressreport exploiting the benefits of my custom streaming varint codec, i have now changed the timestamps in database indexes to be these compact varints. this has resulted in a shrinking of the size of indexes from 64mb down to 59 megabytes, for a stash of events of 204mb as minified wire encoded JSON the current unix timestamps are still under 4 bytes of value, and will remain at this size until 2038, so for now they will only be 4 bytes long, but expand quickly to standard 64 bit signed integer timestamps which covers until 2140. in other notes about development, i have reorganised the documentation of the index type enums so that they are in groups logically ordered, with the metadata keys - pubkey, kind and created_at timestamp in a group of 6 with 3 being the individual, two being kind/created_at pubkey/created_at, and a three way combination of kind/pubkey/created_at these 6 indexes cover all of the primary metadata searches of events themselves, and then i have a series of tag indexes that cover a-tags, which have kind/pubkey/identifier, e and p tags, the p tags can tolerate having hashtags stored in them as appears in some clients broken implementations of follow lists (why they don't use t-tags idk, but it is what it is), standard other tags of single letter (which would include things like the mimetype and other categories), the d-tag, which is the identifier used on addressable (parameterized replaceable), and nonstandard tags also get indexed, so they will actually be searchable, though not via standard hash-letter style filters, it will require an extension of this in the HTTP API lastly there is some cache-management GC type indexes that store first seen, last accessed and access count, which will be usable for calculating whether to prune events out of the store or not, for an archive/cache two layer architecture. i've also been thinking about how to do query forwarding, and for this, http API clients will thus open one subscription using SSE, which will send any out-of-order events or subscription filter matches. this will also mean that standard filter queries will return a relay-generated filter identifier, when query forwarding is implemented at the top of the list of event IDs that are returned, so that when a forwarded query returns the identifier is prefixed to the results that come from the forwarded query, enabling the client to recognise the client that made the query and send those results to the client over the subscription SSE connection. a second distributed relay architecture feature that i am designing that will allow relays to subscribe to other relays latest events (usually just an empty query that continues to forward all new events to the client relay/cache relay) will entail building a subscription queue management system, and an event ack message so that the archive or source relay that is receiving events that get forwarded, tolerate the subscription connection dropping and they will recover the connection and send events that arrived after the subscription dropped and continue. this is an out of band state, that enables the implementation of a full relay distribution strategy last thing that would be required also is auth proxying. for this, clients need to be able to understand that they are authing to a different address than the relay they send a query to, and this would enable a cache type relay to serve a client exclusively to eliminate replication of traffic that the current architecture forces, and causes big problems with huge bandwidth usage currently. with this, you will be able to deploy a cluster of relays, with a few archives, and many caches, and clients will connect to only one of them at a time, and the cache relay will store all query results that they get via the forwarded query protocol, which requires this auth proxy protocol, and additionally, cache relays in a cluster would have subscriptions to each other so that when forwarded queries get new events, as well as returning the result to the client, they would propagate them horizontally, meaning that from that point any other client on the cluster would quickly be able to see these events that did not get published to the specific cache relay in the cluster that triggered the event query forward by the cache. with all of these new strategies in place, both geographical distribution and consortium style collaboration become possible and further decrease the attack surface possible to suppress transit of events on nostr.

David @mleku - 4mo

#realy #devstr #progressreport so i changed my VPS to a new one. i unfortunately paid ahead quite a bit for the one in bulgaria so i've begged them to refund me my balance. no idea if they will, probably not. meh. the thing is, the new one, which seems to be running with the same exact management system (identical UI anyway) which is based in Belgrade now, instead of Sofia, and i've enabled namecheap's PremiumDNS service as well, and am using it also as my DNS for my wireguard tunnel that goes through that VPS now, and i'm not seeing the same problem as before expected behavior, based on what was happening with my local test instance: refresh page post or broadcast a note server sends back CLOSED auth-required client auths client resends event happiness actual behaviour on old VPS all of the above except for some reason on the test realy it would auth instantly, and of course then any event publish works, but on the one running on the vps, it wouldn't auth at first, and randomly would sometimes auth and then start to accept until that socket closed and the merry go round starts again on the new VPS same as test version, no problems, not my bug, fuck you vps.bg

David @mleku - 4mo

last few days #jumble has been getting super buggy and failing to load feeds i'm back to #nostrudel specifically https://next.nostrudel.ninja which actually seems to be an old, better version of nostrudel anyway. getting quite sick of clients pushing bugs to prod man, really my relay has bugs too but my "prod" is a testing deployment, and this morning i just squashed another bug would help if i could get more testers tho, relays bugging doesn't affect the user as much because they are not a single point of failure, and as a bonus, i get bug reports without anyone having to do anything other than just adding wss://realy.mleku.dev to their relay list the benefit to the users who do this is that #realy will be battle tested and finally be able to do what i want it to do i kinda have in mind eventually to create a mechanism whereby users can define lists of other relays where they post so that realy can run a spider that opens up a subscription to other relays to catch events containing their npub and pull them in so they can actually just have one realy in their relay list and they see everything and the client only pulls one copy of events at all times this would solve a lot of the issues of bandwidth burn for people using #amethyst as well not that i expect i will get any cooperation from vitor the great and majestic android client dev

David @mleku - 4mo

hah. i have found a #bug in #realy someone has been sending delete events and somehow the a-tag unmarshal is triggering a nil pointer panic just instrumented the code with some logs to see if i can figure out what is happening, something to do with a-tag unmarshaling, which is the kind:pubkey:dtag thing

David @mleku - 4mo

it was very nice to get a giant zap for coming back but i'm still really salty about nostr and about semisol's using my code without giving me credit i only have a little bit of work to complete the proximity/sequence based full text search on #realy but i'm feeling really cranky about the whole situation and i know i gotta finish it but i gotta cool the fire in the boiler so i can actually do some work at this point gonna finish the damn thing but i'm still super salty about this, despite my gratitude for the help i've been given to keep this work up, y'alls are gonna just have to wear it that i am not being given enough compensation to complete realy beyond finishing this full text search, and as soon as i finish that, realy is in a feature freeze and will only have bugfixes when someone actually uses it and reports bugs to me so i'm gonna finish the fulltext search today, and then make sure it's actually working, and from here on i am 100% on my work to replace the nostr nip-01 protocol with something far simpler and more sane and only using http not this retardo fucking websockets joke that makes auth so freakin complicated nothing has changed in nostr, since i first started bangin on about the problem of auth, and since spending 18 months figuring out how to build a good relay, i learned a lot of things about why and how nostr's design is fucked up this will be fixed and i anticipate i will have the new protocol relay built before the end of next month on a much slimmer, more efficient codebase, and i'm gonna try to get its first commercial application in the next months following that, replacing an onchain forum and matrix based chat i'm gonna have to learn to write at least basic javascript to integrate wasm modules out of my code so front end can just dive straight in and use it also, fortunately that's not gonna be a big surface area to cover

David @mleku - 4mo

just wait until they see it powered by #realy after that full text search on decentralized distributed relays with fast, authed HTTP access and the fastest json codec in the business

David @mleku - 4mo

no, because it's very unclear and to me it just sounds like you are complaining about signers being burdensome on users, and outside of your control as client dev i think i made the point pretty clearly that you DON'T have to kow-tow to the idiotic consensus that you "must use detached signers" for it to be a secure app, the only real concern is that users may become complacent about rogue apps, but equally they could become complacent about signers if more of them existed, so really the problem is moot, eggs, basket, same same imagine how it is as a relay dev when for 6 months of the time i was in development with #realy, i couldn't find a client that actually let me point at my relay and ensure it was even working??? it just seems like a petty complaint to talk about UX of detached signers when you do have the option of controlling that yourself as client dev, not only that, you could bundle your own signer, there is already several forks of nos2x and you could just make your own that has sane policies built into it that fit your needs what is it that i'm missing here?

David @mleku - 4mo

there was an outage of #realy for the last day or so, i changed the wireguard address and the reverse proxy was pointed at the old address, nothing working now fixed, thankfully... i did manage to catch the issue with the firewall yesterday but forgot to check the reverse proxy, thankfully it's now working was wondering yesterday why coracle android wasn't finding my relay... yeah, it was going to /dev/null

David @mleku - 4mo

#realy #devstr #progressreport i have completed the revision of the filter and sort algorithms for the proximity/sequence search strategy for the full text search function it now analyses the groups of indexes per event to find the number and length of sequential results, as well as the distance of these sequences, and then sorts them by ascending order of distance (shortest distance first) and descending order of items in sequence this should work the best within the requirements i have set, which is to make a language agnostic full text search that adequately generates a relevance score based on simple word matches now, back to the fiat mine for a work session, got a few things to sort out, one of them being a retarded API of some anime list website that doesn't have a valid openapi 2 schema and requires me to write a manual http request after hearing about the giant bill on heroku i'm going to also shift the steam playtime data query to using the steam API directly, for the same reasons and lastly, to implement the query to my colleague's full text analysis server to add its scores to the scores generated by the mostly set intersection based comparisons

David @mleku - 4mo

#realy #devstr #progressreport so, i wasn't satisfied at all with the algorithm i designed for sorting results by relevance. it seemed way too haphazard to be effective, so i'm revising it to work a different way: - first, it scans the groups of results matching a single event, and creates a map that contains the distinct matching terms, and a count of how many times each match appears - then it eliminates the results that don't have at least 1 of each term in the search query, because why the fuck should it return anything that doesn't have a complete set of the search terms? this is a baseline IMO, so using the sets and counts of terms above lets me eliminate everything that lacks the complete set - next, it sorts the groups of matches for each event by their sequence in the search terms, so that they are in the same order that they appear in the match - then it repeatedly scans forward through the matched terms in their now sorted order, and gathers every possible group that has a complete set, not necessarily in the same order, but complete - these sets are then evaluated by their distance between first and last of the (possibly several) instances of the complete set, stepping forward from the previous step's starting point, and then generates the minimum distance between first and last of a complete set in the results, and storing this number in each event's result group - the minimum distance of the minimum length set of each then becomes the criteria to sort all of the groups, and the top results will be the ones that come closest to matching the sequence, albeit they can be in a different order, but order is not so important as completeness in the relevance of a result; and if the terms appear in the exact order, their distance will be the minimum of the result set and thus have the most relevance. most full text search algorithms i have seen in action include results that only match one term, but this requires them to have giant dictionaries for the language being searched that contain all of the low relevance terms like "the" and "a" and "an" and "of" and so forth... but i want to make a search that doesn't need this expensive and time consuming list set for every language it can support, instead, it will support all languages, and simply treats each term in the search like a distinct number and searches for the shortest set of matches of these terms as this is the most relevant matches, and there is zero point in keeping any without the complete set, because if the ones that don't match have all of the words that are not distinctive and are very common among all documents, they are useless results and eliminating them saves a lot of time to produce a more efficient result the user can then see if they have put too many minimal relevance terms in their search query to maybe trim out these common words, but the common words can be important - because very often really a pair or even phrase of the search term should appear as a unit, like articles and nouns usually will appear together, and even, in some languages, like nordic and slavic languages, the article and noun combine together, and in other cases, the adjective and noun combine together to become a composite word (eg, bezplatno, schtatite, which mean free as in gratis and the state), but in the english, this term could be more muddled and be a multiplicity, as in "free of charge" and "the state", or "sim pagamento" or "gratis, or "o stados".

David @mleku - 4mo

#realy #devsttr #progressreport i had been observing an irritating thing where it seemed like at first after refreshing web apps, the connection was not authed and sending an event wasn't going through i looked at the sequence of the auth request and noticed that somehow the CLOSED and AUTH messages were being sent in the reverse order, flipped it, and voila, now works realy v1.18.3 is now a fully functioning auth-reqired relay, idk how this one slipped again but i guess regressions happen

David @mleku - 4mo

#realy #devstr #progressreport the full text search is theoretically now implemented it iterates the new fulltext index searching for word matches then it checks that the rest of the filter criteria match, and eliminating candidates that don't have at least one of the criteria, i.e.: - eliminates results outside of since<->until, - that aren't one of the kinds requested, - that aren't published by one of the specified pubkeys, - that don't have the requested language tag, and - that don't have one of the tags in the filter then it groups the result fulltext index entries by same event into a unit and creates a map of them then it converts the random iterating map into a straight array it iterates the array and calculates the distance between the first and last word match in the sequence of the text of the event then it segments the results by number of words that match in each event, grouping the ones with the same number of word matches into groups then it calculates the distance between the first and last word matches in the text (the terms can appear multiple times but are sorted by their appearance, this is just an approximation, since words can appear multiple times) then it gets a list of all the terms in the event by their sequence number in the original search, and with this array it counts how many of those matches are in ascending order matching the search terms with these two metrics, we can calculate relevance, as the higher the number of items in matching sequence the more relevant, and the closer they are together, the more likely they are to match the search text then we sort the groups of same-count of words from the search text by distance AND sequence, meaning the top results have the lowest distance and the highest sequence then with the segments individually sorted, we zip them back together into a list, extract their event id, pubkey and timestamp and return that to the caller (this same result is used by the filter in the HTTP API so it can filter out pubkeys and sort the results by timestamp descending or ascending according to the query parameters). and finally, then it trims the list down to the number of results requested in the limit *or* the configured max limit (512, generally, but the http endpoint will allow 1000 for unauthed users and 10000 for authed users, as it only returns the event id, so the client can paginate them on their side) i know that all sounds complicated but it's now all written and this will enable a fairly decent relevance sorted full text search for nostr text events. the hard part now is going to be testing it, i will probably make two endpoints for this, one will be disabled later, but that one will return the full events in the result, as an array, so i can see them in the api, and squash any bugs or logic errors in the search code... and also see how long it takes for producing the results.

David @mleku - 4mo

#realy #devstr #progressreport i'm getting to the last steps in writing the fulltext query filter handling parts, and i've written the handle for p tags, that was easy, there's an index for that but there isn't an index for e tags, this is what the "extrafilter" is about in fiatjaf's query function in the badger database i'm looking this up and down and i'm like, uh, there's no index for e tags, which are commonly needed to search for replies! i think i have to go to the indexes now and create a new index for e-tags to make this work, because i really don't see how i can filter on them with the fulltext search with a full filter and if i'm gonna make a new e tag index, then i can speed up normal filter searches a lot too hilarious, but kinda annoying, i was all homed in on the fulltext search function and now i have to go sideways and add a new index and that will mean needing to rescan the indexes when i'm done building that, before i can expect this fulltext search to work i'm kinda glad i'm digging deep into this key value store database indexing design stuff... i'm sure this is going to be valuable. i have already learned enough to build a reasonable set of indexes for my fiat mine data analysis stuff, but this is getting a step further advanced, after i've built this i can say i have built a fully capable database engine for nostr that is optimized to search for events... with that e tag index added it will slash the search time by maybe as much as 50% for regular kind 1 or general threaded queries.

David @mleku - 4mo

#realy #devstr #progressreport the responsibilities in the fiat mine in the last few days has been pretty low... this afternoon i probably need to spend a few hours adding a text embeddings similarity score based on public forum/chat messages, but i am mostly ahead of my front end and back end peers in regards to meeting the release target functions, which is funny because i was hounded randomly once a month or so during the building process by the CTO for being "too slow" and actually i'm faster than the rest of them. lol! anyway, i don't mind because it means i can add this feature, i'd almost forgotten about it for the last few weeks it was on my todos, and will make #realy a compelling option for people wanting to deploy relays, because it will have a full text search, and as far as i know, maybe one, proprietary relay implementation being built by an esteemed fellow black sheep is able to do this properly as well. of course i'm gonna probably at some point be competitive about this and do something loony like build a little benchmarker to see how fast it is... meh, probably not, but i'm going to make mine as fast as possible it seems i forgot another todo on the fiat mine task tho, gonna have to put this realy work aside for the morning, continue the fulltext search algorithm later

David @mleku - 4mo

so, i have been looking at my mute lists on my relay and discovered that i have some dozens of them some are encrypted, and others are not i think they are supposed to be replaceable but i think i might have messed that up, but, thank goodness i did because #jumble would otherwise clobber them and the blacklist function on #realy would not work, instead it has 194 mutes found among those lists, but here's the kicker, they in total weigh in at 18mb of events! just my mute lists! i wish i could consolidate them properly, they should only be a bit over 194*64 bytes long in total

David @mleku - 4mo

i just used https://next.nostrudel.ninja to remove that abomination within my mute event i'm probably going to be manually using it now to add npubs to my mute list so my damn relay actually knows what junk to not accept or send to me (the mute list it can read is blacklisted from posting to the relay, and their events are rejected no matter who sends them). of course the rockstar client devs don't understand this because they are a bunch of midwits who couldn't hardly write a relay let alone from scratch or as intensively modified as #realy why it's irritating me so much how else am i gonna signal to my relay that i don't want to be sent trash from dickheads if it can't decode my mute event?

Showing page 1 of 14 pages