Public Pad Latest text of pad rNBu09Mr2M Saved Jan 11, 2025

Hi! Martin Hamilton here! @m@martinh.net on Fedi, m@martinh.net on email, @martin_hamilton:matrix.org on Matrix etc etc.

This ChaosPad is for sharing ideas about how to improve Internet search, following on from 38C3 my talk "Waiter, There's An LLM In My Search!". The ReLive capture of my talk is here in case you missed it: https://streaming.media.ccc.de/38c3/relive/885

At 38C3 I suggested three starter projects as examples of the sort of thing that might be helpful. I'd love to know what you think about them, and what other projects you can imagine! If this is your kind of thing, do leave a comment below - let me know how to contact you if you are interested in getting involved. You might also like to join the SearchClub Matrix Space at #searchclub:matrix.org, which I'm hoping will grow into a community of search afficionados and be a great place to coordinate a few "search improvement" projects :-)

Packaged meta-search for mobile

Rationale: Meta-search sites (e.g. see public SearX/SearXNG instances at https://searx.space ) can be really useful, and give the searcher a lot of control back. But they're kind of janky and complicated to use, and public instances come and go unpredictably. Wouldn't it be great if you could just run meta search as an app on your phone instead? Well, as I show in my talk - you can, however it's highly technical. Most people lack the technical knowledge and confidence / fearlessness to poke around with this stuff - and since they're probably expecting "an app" anyway, we should probably give them one.

What do you think? If you feel this could be useful, would you be interested in getting involved? We're talking about packaging up some Python code as an app, which might be trivial?! I'm kidding, for SearXNG at least there are oodles of Python dependencies which need to be installed in Termux. Wrapping that up and app-ifying it feels like hard work!

Fedi community search add-on

Rationale: Although there are a few really big fedi sites (hi mastodon.social!), there are a very large number of "smallish" sites which are often organised around a community of interest like (say) amateur astronomers, solarpunks or whatever. Maybe the fedi software could be extended to learn about the sites that people are sharing / discussing and offer a "search bubble" that's relevant to the individual community's interests, extending existing built-in search functionality which is focussed on indexing the content of individual posts.

What do you think? If you feel this could be useful, would you be interested in getting involved? Masto is the elephant in the room here, but it is a big chonky mass of Ruby on Rails code so maybe not the easiest to tweak and adapt - and maybe some other fedi apps have already got this nailed anyway? Privacy and permission are critically important of course, so it's possible^H^H^Hlikely that this involves changes to both search and permissions subsystems.

Finding / building block lists of SEO / LLM low quality sites

Rationale: For folk who are technically sophisticated enough to install an ad blocker like uBlock Origin (manifest V3 issues aside), it's possible to do a lot more than just block ads. I'm particularly interested in shared / community managed blocklists which could be used to divert AI slop, SEO bros, content farms etc to the memory hole. As an example, the Huge AI Blocklist (https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist ) can be used with uBlock to hide a lot of genAI imagery from popular search engine results.

What do you think? If you feel this could be useful, would you be interested in getting involved? There are quite a few shared blocklists around already (e.g. see https://iorate.github.io/ublacklist/subscriptions ) so it may be helpful to think in terms of which domains and use cases are already covered, and where the gaps are. Maybe there is (or should be!) a Wikipedia page to catalogue them? For bonus points: what should the metadata for a blocklist look like? e.g. is it managed by a group or an individual, when was the last update, how often updated, rationale for blocking (spam, bitcoin, ai, nazis etc) and so on...

Distributed crawling - supporting mwmbl.org

In my talk I mention the massive scale of the compute / networking / data storage required to create a "whole web" index, and how it's probably infeasible to do this unless you have access to significant resources. However, there is a really interesting edge case - what if we all did a bit of crawling, like how you can run a Snowflake Proxy as a browser extension? This is what the open source community search engine mwmbl is doing - see https://mwmbl.org/.

No real ask here, but maybe consider supporting mwmbl, e.g. running their crawler extension for Firefox or their Python crawler. There might also be some mwmbl related enhancements that you can envisage...

Ranking meta-search engine results

Rationale: I didn't really cover this in my talk, but it's something that came up in the Q&A - when you are slicing and dicing results from multiple search engines to deliver a single "coherent" result set to the searcher, how the results are ranked becomes super important. This might be just about bearable for weird nerds like us, but there is probably a lot of useful work that could be done to improve ranking (say in SearXNG for example) to make meta-search more useful for normal people.

What do you think? If you feel this could be useful, would you be interested in getting involved? NB I keep mentioning SearXNG, but it's not the only open source meta-search engine, just the one that I have mostly been playing with. Maybe you know another one which is "better" / more readily hackable etc etc?

<< YOUR IDEA HERE >>

If you made it this far, you probably have a lot of thoughts about how we could make search more survivable for regular folk and / or weird nerds. Don't be shy - please do share! Remember to leave some contact info or ping me (details at the top of the doc) + consider joining #searchclub:matrix.org. kthxbye :-)-4.7 dB

Tbh there is no alterative to bing hell. But I know this https://softwarerecs.stackexchange.com/a/90557/70550 contain links to alternative search engines that are not crawled by bing, but we'd have to stop using bing crawlers, and probably give up on indexing anything by cloudflare (might not be a bad thing) I think we need more wikis in general. Maybe like a search engine where everything is more... public or open. People can comment on search results, etc. Or we have to go back to the days of yahoo and simply curate domains.

Thanks for this - the CloudFlare thing is hugely significant. According to a CF news release from September 2024 roughly 20% of websites are protected (if that's really the word!) by CF. So if you are a crawler and CF takes a dislike to you, it's practically game over. Also if you are an individual contributor to a shared index like the Mwmbl model, you might find that you lose access to all those sites if CF decides your crawling behaviour is against their ToS.

Viewing latest content
Link to this version
Link to read-only page
Edit this pad

Download as

HTML

Plain text

Microsoft Word

PDF

Server Notice:

Public Pad Latest text of pad rNBu09Mr2M Saved Jan 11, 2025

Download as

Authors