Kagi Small Web has about 32K sites and I'd like to think that we have captured most of (english speaking) personal blogs out there (we are adding about 10 per day and a significant effort went into discovering/fidning them).
It is kind of sad that the entire size of this small web is only 30k sites these days.
Same - but mine are also primarily so I can hand out links to specific articles - they're not hidden but they're not advertised either (and they're static sites with almost zero logging, so I wouldn't really notice either except that this site has a published list :-)
Hi, I took a quick look around the niche I'm interested in, and there's a lot of local history blogs you're missing. One of the bigger examples: https://threadinburgh.scot/
On reflection, maybe you've captured the bulk of the "Small Web Movement" (the technology-leaning bit of the blogosphere that is self-consciously part of a reactionary movement against the corporate web) but you haven't captured the bulk of the still-active blogosphere?
So I've got a question: What's the mission statement for kagisearch/smallweb - a curated list of Small Web sites, or a curated list of active blogosphere sites?
Because the current strategy for adding sites seems heavily biased towards the small web movement to me.
What methods are you using to find them? I notice my own doesn't appear, although it does show up well under some (very niche) Google search terms. I suspect there's the potential for an order of magnitude more sites than have been found.
I noticed that Kagi Small Web tends to lean towards more tech focused blogs. So it feels more like you've captured that subset of the small web, especially if your main source is hackernews.
Not sure if you've used this as a source too but there's a lot of tiny personal sites in this directory too.
https://melonland.net/surf-club
There is a '↗'-shaped icon in the navigation bar at the top. If you click on that it takes you to the original post in a new tab. On Firefox and Safari, you can also right click that icon and add the original post to the bookmarks.
Does this concept of "personal blog" include people periodically sharing, say, random knowledge on technical topics? Or is it specifically people writing about their day-to-day lives?
"If the blog is included in small web feed list (which means it has content in English, it is informational/educational by nature and it is not trying to sell anything) we check for these two things to show it on the site: • Blog has recent posts (<7 days old) [...]"
Why would you only include blogs in your small web index? That must be a minute fraction of what is out there?
I can't think of a single blog that I read these days (small or not), yet there are loads of small "old school" sites out there that are still going strong.
> Why would you only include blogs in your small web index?
I am not associated with this project, so this would be a question for the project maintainer. As far as I understand, the project relies on RSS/Atom feeds to fetch new posts and display them in the search results. I believe, this is an easier problem to solve than using a full blown web crawler.
However, as far as I know, Kagi does have its own full blown crawler, so I am not entirely sure why they could not use it to present the Small Web search results. Perhaps they rely on date metadata in RSS feeds to determine whether a post was published within the last seven days? But having worked on an open source web crawler myself, many years ago, I know that this is something a web crawler can determine too if it is crawling frequently enough.
So yes, I think you have got a good point and only the project maintainer can provide a definitive answer.
It is kind of sad that the entire size of this small web is only 30k sites these days.