Are we decentralized yet?

by Bogdanp- arewedecentralizedyet.online

TIL about the Herfindahl–Hirschman Index and I wanted to test it with a weird corner-case that I remember.

At one point in the late 1980's Microsoft had a GREATER than 100% market share of the Macintosh spreadsheet market.

How is this possible?

Market share (for a given period) is the participant's sales in the market divided by total sales. It just so happened that Lotus had more returns than sales of their failed spreadsheet, Lotus Jazz. So Lotus, had a negative market share and Microsoft had more sales of Excel than total sales in the market, resulting in a greater than 100% market share.

I don't remember the exact numbers and I believe there was at least one other competitor in the study. But let's just say the numbers were:

Microsoft: 102% Lotus: -2%

In that case the Herfindahl–Hirschman Index would be 102^2 + (-2)^2 = 10404 + 4 = 10408.

So, in this pathological case it is possible for the HHI to exceed 10,000.

Edited: Added (for a given period) above, for clarity.

I have diligently searched for this article online and have been unable to find it. (It might be on microfiche somewhere...)

I did however, find this humorous anecdote:

> A Lotus executive later joked, "The first month we shipped 62,000 copies, and the following month we got 64,000 copies back. It was such a failure they sent us the bootlegged copies back."

https://www.forbes.com/2003/12/16/cx_el_macslide.html

HHI is a super useful metric -- glad you like it!

The squared sum of normalized shares proves to be very useful in a lot of contexts -- not just market share. Voting is one great example.

Sounds depressing (i.e. ~5000 in a typical US election)

It gets really interesting when you look at the precinct and county level in the US, and similar types of views in European countries where they do real representative government.

It's ironic that national level politics mirrors the kind that the founding fathers did not ever want happening, while local politics has the kind of representation that they actually had in mind (in most places).

[flagged]

> The only way that makes any sense is if you subtract returns for sales made in a different period to the sales period you are considering

Exactly. That's the way accounting works. They did not know in the previous quarter that the product would be returned in the following quarter, so they end up having negative sales in the current quarter.

Yes it produces "garbage output", which I find amusing.

But that isn’t how you calculate market share, so what you’re saying is nonsense.

There are multiple ways of calculating market share (e.g. units vs dollars or for different time periods) but assuming it is measured in dollars for a quarterly time period, how would you calculate the market share based upon my sample data above?

(company sales in period - company returns for sales made in said period) / (industry sales in period)

With this formula market share for past periods can keep changing arbitrarily far into the future (depending on the companies' return policies).

That is how it was calculated in a published trade magazine (either Infoworld or MacWeek, I think) I'm not sure if the the analysis was done by a market research firm or the magazine.

Presumably by a journalist who doesn’t understand what market share is

Everybody else is telling you you're wrong and you're doubling down and insisting that respectable journals are morons who don't understand anything.

You should stop, reflect on this fact for a moment, then go pick up a goddamn book

The story sounds fishy, but one way to have returns exceed sales is to ship 1,000 to stores on consignment, the store sells 500 of those for cash, the buyers return their 500 for refunds, and the store returns the remaining unsold 500. So 500 actual sales but 1,000 returns.

I've heard of companies doing things like this to "cook the books" for a quarterly report.

>So 500 actual sales but 1,000 returns.

Sure. As long as we keep in mind that "return" doesn't mean "reverted sale", but "reverted shipment to retail".

That's the beauty of software: It doesn't have a physical form, so you can buy it once but return it infinity times!

Neat! I'm not surprised at the findings here. BlueSky (for the average user) is pretty much a drop in replacement for Twitter.

Despite the smaller total numbers in Mastadon, it's great to see that the ecosystem seems to be successfully avoiding centralization like we've seen in the AT-Proto ecosystem.

I suspect that the cost of running AT proto servers/relays is prohibitive for smaller players compared to a Mastadon server selectively syndicating with a few peers, but I say this with only a vague understanding of the internals of both of these ecosystems.

Running a PDS server for yourself and a few friends is not very expensive afaik, but there's also not much benefit to doing so, because the point of the PDS is to have a clean separation between your own data and the rest of the network.

The expensive things in ATProto are the Relay (crawls/listens to PDSs to produce the firehose) and the AppView (keeps a DB of all posts/likes/etc to serve users' requests). Expensive at scale anyway; if you want your own small network for hosting non-Bluesky posts (like WhiteWind's longer character limit), the event volume will be manageable.

For a lot of stuff though ATProto is built in a way that you shouldn't have to host your own; you can implement your own algorithmic feed that reads from the Bluesky Relay's firehose, or your own frontend that still gets data from the Bluesky AppView.

Running a relay is not expensive anymore (it used to be), with recent changes it's about $30/mo. Running an AppView that ingests all ongoing Bluesky traffic (and puts it into database) is more expensive ($300/mo currently) but if you were happy with a partial view of the network, you could get it down by a lot.

> Running a relay is not expensive anymore [...]

$30/mo is $360/yr, which for most people is a prohibitively large sum of money. That would make Bluesky access more expensive than even the most expensive Netflix subscription; closer to the cost of a cellular plan.

For comparison: for my Mastodon account I pay $5/mo or $60/yr to a dedicated hosting provider. This puts it in the same ballpark as paying for a private email host or a VPN subscription.

The PDS is closer to the Mastodon account and will run you the same amount of money. The relay or appview is what takes the load when one of your posts goes viral, whereas in mastadon your $5 VPs has to handle that spike in traffic. Been several a story about how AP has DDoS'd a small server because there is no equivalent to the relay

This makes sense, but a Relay isn't something you'd expect a normal user to run.

It doesn't meaningfully make you "more independent" because all Relays are trivial (they're just dumb re-broadcasters of a stream) and it makes sense to use one run by somebody else — a company or a community that's pooling resources.

I'm admittedly not versed in the AT protocol, but $30/month is high for me. Something like $10/month is quite fair, and i would expect should be more than enough for a VPS to host my and 2 other members of my family for any social network service...This $10/month is overkill for other servers on the ActivityPub via say gotosocial, pleroma, etc. Not that ActivityPub is perfect or anything like that, i just mean that $30/month is not yet what i would call a sweet spot for self-hosting something...but of course, that is absolutely leagues better than any cost that the bluesky team or other bigger players would naturally have to pay for ongoing infra., etc.

Then again, i will not deny that there's also the possibility that i am simply cheap! :-)

Right, but a Relay is literally a network-wide optimization. That's why in my other comment I'm mentioning that it's hard to do apples-to-apples comparison. With Mastodon, you only have one possible role: "hosting an instance" (which is like hosting a mini-Twitter with almost no data). ATProto aims higher in terms of UX (shared world by default) so there are different distributable pieces with different incentives to run. Anyone can self-host a PDS for super cheap (which will store only your data). But running a Relay is useful as an optimization for whoever's running the actual backends. So if you're a company building on ATProto, maybe you run your own Relay just in case (or maybe you don't and reuse an existing one). Or if you're a collective of people, maybe you pool resources together to run your own independent Relay. It's not something you'd just run for yourself unless you're a huge enthusiast.

An important note here is the $30/mo is for the relay which is _not_ something the average user will ever need to run

Instead, you're looking for hosting a PDS which you absolutely can do for $10/mo (or less)

I run a PDS on a OVH Cloud VPS for $5/mo for myself, some alts, and some bots

> For a lot of stuff though ATProto is built in a way that you shouldn't have to host your own; you can implement your own algorithmic feed that reads from the Bluesky Relay's firehose, or your own frontend that still gets data from the Bluesky AppView.

ATProto isn't "built this way".

Twitter was also built in a way where you could implement your own stuff - and then Twitter took that away.

With Mastodon, there is one large instance (controlled by the non-profit Mastodon gGmbH). If they tried closing themselves off, their users would be losing access to the majority of people in the network. Plus, while non-profits aren't perfect, they don't have VC investors to answer to.

Bluesky could decide to stop publishing the firehose or restrict its APIs - just as Twitter did. Given that they control 99.55% of the network, they can close it off without worrying about their users losing access to anything. And Bluesky is a for-profit company that has raised around $30M in VC.

What you talk about isn't a feature of ATProto. It's a feature of being centralized and having a company willing to let you use their servers for free (at least for now). This was the case with Twitter for a long time. You could read the Twitter firehose and build your own apps and frontends getting data from the Twitter APIs - just as you can do from Bluesky today.

But unless there's a reason why shutting off the firehose/APIs would be bad for Bluesky, they can do that at anytime. It might anger some users (as Reddit and Twitter both did), but they control the network and network effects are powerful. For most Bluesky users, they'd continue using it because they aren't there for some open protocol. They're on Bluesky because Twitter became a nazi bar. Until we see real decentralization with ATProto, it's just a centralized network like Twitter or Reddit which hasn't shut off its firehose and public API yet. Hopefully that won't happen, but it certainly could.

No I think ATProto is "built this way". The firehose is just the output of the Relay. If Bluesky wanted to shut off the firehose, they would have to make changes to ATProto or stop conforming to ATProto.

I understand that Bluesky's conformance to ATProto is just a promise, but it's a better promise than you get from most websites. Also in the meantime, if you migrate to a self-hosted PDS, you can ensure that even if Bluesky restricts access to their Relay's firehose, 3rd party Relay servers can still pick up your posts and publish their own unrestricted firehose.

What do you think would happen if the Bluesky company suddenly blocked everyone but https://bsky.app/ servers from using their relays?

And what if, before they did that, they updated the PDS code so it blocked all relays except for their one?

I'm not asking what you would do. I'm asking what would happen.

> BlueSky (for the average user) is pretty much a drop in replacement for Twitter.

One reason Bluesky is so successful is because it doesn't shove decentralisation into the user's face like Mastodon does. The vast majority of people don't know what decentralisation is and don't care to.

I think that far too much effort is put into decentralisation and not enough into good moderation on these platforms.

I don't use Mastodon because it's too decentralized.

What I mean is I own my own domains but I can't use them on Mastodon without self hosting an entire Mastodon server for one user per domain. Yes there are other implementations of the protocol but none really solve this well in a cheap to run way.

Mastodon's missing feature is identity portability. A user with their own domain should be able to easily use a larger instance to host their identities and be able to migrate them to another instance.

Moderation is definitely Fediverse’s weakness.

https://roost.tools is working on open-source options for both social (fedi or not) and beyond social. Generally the idea is that we can fight the bad things better if we are working together instead of independently

ATProto's Stacked Moderation is an interesting approach to combine platform, community, and user level choices

https://bsky.social/about/blog/03-12-2024-stackable-moderati...

I'm curious if you could expand on this observation? I've heard this from other Mastodon users but I haven't seen it myself; I wonder if it varies heavily from server to server or if I've just gotten lucky.

Moderation (the intent and success) varies to such a huge extent that it's practically silly to talk about moderation on Mastodon unless you mean moderation on a specific mastodon server (like mastodon.social). But moderation (the process) is intense and servers are usually community run on the change found in a spare couch (i.e. they're volunteers).

I think they do quite well considering the disparate resource levels, but some servers are effectively unmoderated while others are very comfortable; plenty are racist or other types of bigot friendly, but the infrastructure for server-level blocks is ad-hoc. Yet it still seems to work better than you'd guess.

Decentralization means whomever runs the server could be great, could just not be good at running a server, could be a religious fundamentalist, a literal cop, a literal communist, a literal nazi, etc etc. And all have different ideas of what needs moderating. There is no mechanism to enforce that "fediverse wide" other than ad-hoc efforts on top of the system.

There are also significant practical user experience differences.

Doing a search on Twitter searches Twitter, the whole thing. A search on Mastodon only knows about the servers you're connected to (unless you're searching for a specific user, then it'll micro-target their server to get their account info, but you have to know their name through some side-channel. Similarly, if you chance across a Mastodon post and want to follow that user, unless you happen to be on the same node as them you have to enter your own node data to get redirected to do the follow because of the domain-based nature of web security.

These aren't deal-breakers but we have the hard numbers from other web UX to know that every time you put a friction point like these in the flow, you immediately lose some x% of users. Relative to services that are centralized, these things will slow Mastodon adoption.

(This may not be the worst thing. There are other goals besides maximizing the adoption numbers.)

> Despite the smaller total numbers in Mastadon, it's great to see that the ecosystem seems to be successfully avoiding centralization

But since essentially no one is using it doesn’t suggest much avoidance of centralization. These factors are not independent. It’s pretty easy to avoid anything when your total user count is a rounding error compared to the alternatives.

ATProto also has the downside of being supported by a corporation and investors with various backgrounds that will eventually want to earn something out of it all and there is no telling how this will happen.

There are lots of ways they could make a sustainable income without disrupting Bluesky's current status quo and be comfortably rich for the rest of their lives... but that's completely out of character for them and will never happen. I do think the geeks currently running Bluesky are sincere in keeping it decentralized, but the money people will someday probably force them out and squeeze the user base for a quick buck. A hardcore nerd minority will splinter off, though, and keep the decentralized version running, so whatever. History repeats. Frog swims, scorpion stings.

>There are lots of ways they could make a sustainable income without disrupting Bluesky's current status quo and be comfortably rich for the rest of their lives... but that's completely out of character for them and will never happen.

Sounds like reddit 15 years ago

Is it a quick buck? How long has bluesky been funded for with no return?

I meant "quick buck" like flooding the place with ads, tracking, dark patterns, closing the API/protocol, or doing some sort of crypto scam, with no regard for the platform's long-term health. It's been funded for a few years? That's not really that long for such a small team. But I have no idea how their investors think it might make "Facebook money", and isn't that always the goal?

Quick meaning it's short term from when it starts. The time before it starts doesn't factor in.

> There are lots of ways they could make a sustainable income without disrupting Bluesky's current status quo

Like what?

Off the top of my head: Paid hosting/services on top of the protocol, reddit-gold style tipping and gamification, being a transactional middleman (there are a lot of artists selling things, famous people promoting things, and so on), promoted posts and ads (easily blockable, but some users wouldn't bother).

Bluesky is a small team of like ~30 people, if they keep running lean they have at least a chance of a decent profit margin. But none of that will make anyone a multi-billionaire, so never mind.

I think you are wildly overestimating the user's willingness to pay online, and underestimating the costs to run a large scale site. Even if you remove developer's salaries and server costs, the fines could be worth 10s of millions of dollar per country just for delay in removal of hate speech[1].

[1]: https://en.wikipedia.org/wiki/Network_Enforcement_Act

ATProto also currently effectively relies on https://web.plc.directory which is a centralized service, making the protocol effectively centralized.

This is true but it's worth noting that (1) the entire point of this node is to be globally agreed on since it's the root of the identity mechanism, (2) it is auditable (https://github.com/did-method-plc/did-method-plc?tab=readme-...) and operations themselves are self-certifying (https://github.com/did-method-plc/did-method-plc?tab=readme-...). There are some potential issues (like PLC could choose to deny some operations), and the plan is to upstream PLC into an independent entity so that it isn't tied to Bluesky the company.

How does Meta's adoption of ActivityPub play into the corporation and investors supporting / dominating a protocol in the long run?

>I suspect that the cost of running AT proto servers/relays is prohibitive for smaller players compared to a Mastadon server selectively syndicating with a few peers, but I say this with only a vague understanding of the internals of both of these ecosystems.

This isn't quite right. ATProto has a completely different "shape" so it's hard to make apples-to-apples comparison.

Roughly speaking, you can think of Mastodon as a bunch of little independently hosted copies of Twitter that "email" (loosely speaking) each other to propagate information that isn't on your server. So it's cheap to run a server for a bunch of friends but it's cut off from what's happening in the world. Your identity is tied to your server (that's your webapp), and when you want to follow someone on another server, your server essentially asks that other server to send stuff to yours. This means that by default your view of the network is extremely fragmented — replies, threads, like counts are all desynchronized and partial[1] depending on which server you're looking from and which information is being forwarded to it.

ATProto, on the other hand, is designed with a goal of actually being competitive with centralized services. This means that it's partitioned differently – it's not "many Twitters talking to each other" which is Mastodon's model. Instead, in ATProto, there is a separation of concerns: you have swappable hosting (your hosting is the source of truth for your data like posts, likes, follows, etc) and you have applications (which aggregate data from the former). This might remind you of traditional web: it's like every social media user posts JSON to "their own website" (i.e. hosting) while apps aggregate all that data, similar to how Google Reader might aggregate RSS. As a result, in ATProto, the default behavior is that everyone operates with a shared view of the world — you always see all replies, all comments, all likes are counted, etc. It's not partial by default.

With this difference in mind, "decentralizing" ATProto is sort of multidimensional. In Mastodon, the only primitive is an "instance" — i.e. an entire Twitter-like webapp you can host for your users. But in ATProto, there are multiple decentralized primitives:

- PDS (personal data hosting) is application-agnostic data store. Bluesky's implementation is open source (it uses sqlite database per user). There are also alternative implementations for the same protocol. Bluesky the company does operate the largest ones. However, running a PDS for yourself is extremely cheap (like maybe $1/mo?). It's basically just a structured KV JSON storage organized as a Merkle tree. A bit like Git hosting.

- AppViews are actual "application backends". Bluesky operates the bsky.app appview, i.e. what people know as the Bluesky app. Importantly, in ATProto, there is no reason for everyone to run their own AppView. You can run one (and it costs about $300/mo to run a Bluesky AppView ingesting all data currently on the network in real time if you want to do that). Of course, if you were happy with tradeoffs chosen by Mastodon (partial view of the network, you only see what your servers' users follow), you could run that for a lot cheaper — so that's why I'm saying it's not apples-to-apples. ATProto makes it easy to have an actually cohesive experience on the network but the costs are usually being compared with fragmented experience of Mastodon. ATProto can scale down to Mastodon-like UX (with Mastodon-like costs) but it's just not very appealing when you can have the real thing.

- Relays are things "in between" PDS's and AppViews. Essentially a Relay is just an optimization to avoid many-to-many connections between AppViews and PDS's. A Relay just rebroadcasts updates from all PDS's as a single stream (that AppViews can subscribe to). Running a Relay used to be expensive but it got a lot cheaper since "Sync 1.1" (when a change in protocol allowed Relays to be non-archiving). Now it costs about $30/mo to run a Relay.

So all in all, running PDSs and Relays is cheap. Running full AppViews is more expensive but there's simply no equivalent to that in the Mastodon world because Mastodon is always fragmented[1]. And running a partial AppView (comparable to Mastodon behavior) should be much, much cheaper — but also not very appealing so I don't know anyone who's actually doing that. (It would also require adding a bit of code to filter out the stuff you don't care about.)

[1] Mastodon is adding a workaround for this with on-demand fetching, see https://news.ycombinator.com/item?id=45078133 for my questions about that; in any case, this is limited by what you can do on-demand in a pull-based decentralized system.

That's a big post that doesn't really explain in what way can smaller players federate with ATProto or how the structure allows federation.

Reading through it, it just sounds like sharding/scaling for a centralized service that's meant to be owned and provided by a single entity.

>in what way can smaller players federate with ATProto or how the structure allows federation.

Each of the pieces I've described (PDS, Relay, AppView) implement the protocol specified at https://atproto.com/. Anything that acts as an ATProto PDS can be used as an ATProto PDS, anything that acts as an ATProto Relay can be used as an ATProto Relay, and so on. I'm not sure I understand the question so pardon the tautology.

The structure allows federation by design — a Relay will index any PDS that asks to be indexed; an AppView can choose the Relay it wants to get the data from (or skip a Relay completely and index PDS's directly); anyone can make their own AppView for an existing or a new app. That's how there are multiple AppViews (both for Bluesky app and for other ATProto apps) ingesting data via multiple Relays from many PDS's. There aren't many independent operators of each piece (especially outside of PDS self-hosting) but nothing is privileging Bluesky's infra.

Additionally, Bluesky's reference implementations of each piece are open source. So people run them the same way you would usually run software -- by putting it on a computer and exposing it to the internet. To run a custom PDS, you can either use the Docker container provided by Bluesky (https://github.com/bluesky-social/pds) or implement your own (e.g. https://github.com/blacksky-algorithms/rsky). Ditto for other pieces.

>Reading through it, it just sounds like sharding/scaling for a centralized service that's meant to be owned and provided by a single entity.

You're right in that the goal is to make it on par with centralized services in terms of UX and performance/scaling. However, it is decentralized.

The picture at the end of this article might help: https://atproto.com/articles/atproto-for-distsys-engineers

I think what they are asking is: if I run a my own BlueSky AppView, how do I integrate with the bluesky.app so that users signed in on my AppView can interact with users on the the main AppView and vice versa? This is how most of us think about federalisation.

AppViews do things like track likes, or replies.

The AppView doesn't do that only for Bluesky data. It does it for any Personal Data Stores (user accounts with all their user data) that it knows about.

When you "interact" with users elsewhere, all you do is generate new records on your own PDS. You generate a "like" entry, or a reply, on your own PDS. It's your pds, all your stuff goes there. The AppView sees that and indexes it, attaches that like or that reply in the AppView to the post you're reacting to.

That's already the default behavior. You don't need to do anything special for that to work, it's what ATProto is designed for. Everything is always operating in a shared space by design.

When people "post", their posts go to their PDS's, which means that every AppView ingests data generated by every other AppView by default. There is no way to tell who's using which AppView — in fact, you can log into any AppView and your profile will be there with all your posts.

Other people can run AppViews too (that operate over the same or different data). There are just less people doing that than hosting Mastodon instances partly because it's much more expensive to, because some of the benefits of hosting a Mastodon instance can be obtained much cheaper through running a PDS server, and because AppViews doesn't serve the same exact social role that Mastodon instances do. (Mastodon has every instance be a semi-isolated community, so Mastodon instances are often made for the social purpose of running a semi-isolated community. Bluesky users expect a global timeline that's not partitioned by server instances, so it doesn't get many people running AppViews specifically for fostering semi-isolated communities. People on Bluesky who want to foster semi-isolated communities tend to use features like custom timelines to do so instead.)

Decentralized has a problem. Its premise is that anyone can set up and run a node, cheaply.

If the decentralized network allows for some kind of targeted broadcasting, it becomes attractive for spamming (e.g: email).

If the decentralized network allows for concentration of responses on something, it becomes a potential tool for a DDoS attack (e.g.: DNS amplification).

So running a node should be somehow expensive, but the expense should be written off if the receivers of the message endorse it, by a one-time action, or automatically by subscribing. An initial credit would allow to establish an audience.

It looks like a perfect use case for a cryprocurrency of sorts %) But this means expensive coin generation, and distribution of the huge ledger across all nodes. That could be delegated to some specialized nodes, but here comes centralization again!

So we are not decentralized. Git was a good attempt, but it kind of got centralized around GitHub, GitLab, and other variants. BitTorrent was decentralized, except tracker sites were the natural centralization points. Bitcoin was also decentralized, but still had Coinbase and other sites. Even SMTP is de facto centralized due to the spam problem.

> Even SMTP is de facto centralized due to the spam problem.

It's important to note that this isn't "you have to be big in order to be able to filter spam". That's not true at all; decentralized anti-spam lists have been a thing for decades and the big sites don't have any significant advantage in filtering spam.

It's allegedly that big sites will mark small sites as spam even when they're not, which makes it hard to run a small mail server. And there is some of that -- they also have a perverse incentive to do it on purpose because it kills their smaller competitors.

But it's also somewhat overstated. If you have a reverse DNS entry pointing back at your mail server and have properly configured DKIM, it's not inherently the case that you're always going to be marked as spam. And it's not inherently the case that you won't just because you use one of the big services -- they have the same incentive to do that to each other, after all.

> except tracker sites

There are loads of different tracker sites. Many private. If one goes bad, others pop up to replace it. This is decentralised - there is not one player that strangles the ecosystem.

> There are loads of different tracker sites

Not to mention the mainline DHT. Not impossible or even very resource hungry to run a scraper/crawler/listener and be able to search via it, like bitmagnet (https://bitmagnet.io/) that has some fun pipe-dreams like federation of indexers and something like an decentralized private tracker.

Anyone can construct a service like Coinbase, and in fact there are numerous such sites available; at this point, you can even use PayPal! You don't even have to continue using the same one you started with: you can buy Bitcoin with PayPal and then sell it with Coinbase. It seems like a very strange definition of centralized to say that this causes any kind of centralization on Bitcoin...

Well git was not really an attempt at decentralization so there is that.

And in many ways it actually is remarkably meaningfully decentralized - or perhaps "effectively" decentralized - in terms of every node having a full working copy of entire repositories that can be trivially cloned to another provider or stood up on a self-hosted server.

I’d say the opposite. Git is defacto centralized. And I’d go further and say that attempting decentralization is the biggest problem with Git. It maybe makes sense for Linux. But for 99.999% of projects it makes everything so much worse.

Everything you listed IS centralized, though.

We're more decentralized in fedi but we're also not really consistent either. Which I think is the number one gripe for users who manage to get into the fedi.

I don't mind, I still think it's a huge leap forward, but it's important to set realistic expectations.

What do you mean by consistent?

(Never used the fediverse, so zero context here).

Not who you're replying to, but one issue is that different users will sometimes see a different subset of replies to a post, depending on what the server they're using has copied from other servers.

This isn't correct. Mastodon merged fetch-all-replies in March. https://github.com/mastodon/mastodon/pull/32615

The only difference in visible replies is in the moderation choices of the server the post is viewed from.

How does it work?

In ATProto, there is no need to do this on-demand because the data is already there in the AppView. When you want to serve a page of replies, you read them from the database and serve them. There is no distributed fetching involved, no need to hit someone else's servers, no need to coalesce them or worry about limiting fetches, etc. This is why it works fine for threads without thousands of replies and hundreds of nesting levels. It can also be paginated on the server.

If you don't have this information on your server, how can you gracefully fetch thousands of replies from different servers and present a cohesive picture during a single request? I'm sure this PR does an attempt at that but I'm not sure this is a direct comparison because Mastodon can't avoid doing this on-demand. If we're comparing, it would be good to list the tradeoffs of Mastodon's implementation (and how it scales to deep threads) more explicitly.

There is a detailed explanation available at the link I posted. Second header, "Approach".

What do you expect the performance characteristics to be compared to querying a database?

I expect them to be unimportant. This has been merged upstream and running on the flagship Mastodon instance for a little while now.

There is also a section related to performance available at the link I posted. Third header, "Likely Concerns", second subheader, "DoS/Amplification".

What do you mean by unimportant?

I mean from the user's perspective: when I open a thread, I expect to instantly see the entire discussion happening across the entire network, with the paginated data coming back in a single roundtrip. Moreover, I expect every actor participating in the said discussion (wherever their data is stored) to see the same discussion as I do, with the same level of being "filled in", and in real time (each reply should immediately appear for each participant). It should be indistinguishable from UX of a centralized service where things happen instantly and are presented deterministically and universally (setting aside that centralized services abandoned these ideals in favor of personalization).

With ATProto, this is clearly achieved (by reading already indexed information from the database). How can you achieve this expectation in an architecture where there's no single source of truth and you have to query different sources for different pieces on demand in a worker? (To clarify, I did read the linked PR. I'm asking you because it seems obviously unachievable to me, so I'm hoping you'll acknowledge this isn't a 1:1 comparison in terms of user experience.)

To give a concrete example: is this really saying that replies will only be refreshed once in fifteen minutes[1]? The user expectation from centralized services is at most a few seconds.

[1]: https://github.com/mastodon/mastodon/pull/32615/files#diff-6...

I'm not very interested in arguing over the ins and outs of "user expectations" and Mastodon vs. Bluesky, sorry. I would suggest you try it yourself and come to your own conclusion about whether this is a usable system :^)

I'm arguing that "not really consistent" from the grandparent post still applies, and therefore your "this isn't correct" isn't correct.

For realtime discussions (like this one), I don't think we can call it consistent if it takes multiple minutes for each back-and-forth reply to propagate across instances in the best case (and potentially longer through multiple hops?) because you'll see different things depending on where you're looking and at which point in time.

In practice, this is rarely an issue due to the nature of human attention. Beyond a couple dozen speakers in a conversation, it's noise.

At least to my observation; I haven't pulled apart the protocol to know why: if you're in a conversation in Mastodon it's real good about keeping you in it. The threading of posts seems to route them properly to the host servers the conversing accounts live on.

And yet I could have a realtime public threaded conversation on Twitter, and am having one on Bluesky (regardless of which PDSs or AppViews other people are using), but cannot in principle have on Mastodon (unless everyone I talk to shares my instance). Does this say anything about relative ability of ATProto vs ActivityPub to meaningfully compete with centralized services?

I hear your point that slower conversation can be better. That’s a product decision though. Would you intentionally slow down HN so that our comments don’t appear immediately? You could certainly justify it as a product decision but there’s a fine line between saying you should be able to make such decisions in your product, and your technology forcing you to make such decisions due to its inability to provide a distributed-but-global-and-realtime view of the network.

I'm not sure where you reached the conclusion that you can't have a realtime public threaded conversation on Mastodon; I do it frequently. The way it generally works is that clients will auto-at-tag people in the conversation, which makes sure the message is routed to all in the conversation within more-or-less milliseconds.

Auto-at-tagging doesn't scale to dozens and dozens of actively-engaged speakers, but neither does human attention, so that's not a problem that needs to be solved.

I realize that we might be arguing over definitions here, but to me part of the experience of Twitter-like conversation is seeing other replies appear in real time even when they’re not directed at me — same as how you’ve noticed this thread on HN.

Seeing the existing convo in real time lets me decide which points to engage with and which have been explored, and to navigate between branches as they evolve in real time (some of which my friends participate in). I do earnestly navigate hundreds of times within an active thread — maybe it’s not your usage pattern but some of us do enjoy a realtime conversation with dozens of people (or at least observing one). There’s also something to the fact that I know others will observe the same consistent conversation state at the time I’m observing it.

You might not consider such an experience important to a product you’re designing, but you’re clearly taking a technological limitation and inventing a product justification to it. If Mastodon didn’t already have this peculiarity, you wouldn’t be discussing it since replies appearing in realtime would just seem normal.

In either case, whether you see it as a problem to be solved or not, it is a meaningful difference in the experiences of Twitter, Bluesky, and Mastodon — with both Twitter and Bluesky delivering it.

Good to know. Thanks for the update!

How would one measure old school federated contexts like IRC and NNTP in this way? I wonder would they would fare.

Remember how freenode changed owner and pretty much everyone moved away from it in less than 1 week? It was easy and possible.

Perhaps "frictionless migration" is the real metric to optimize for, rather than decentralization at any given point in time.

I tend to agree. Having tried the Fediverse twice & each time had my server shut down, had a pretty jank sad partial migration forward path (my old replies kind of being cast into limbo), it just doesn't feel like the fediverse actually has "credible exit" at this point. Decentralized but still semi trapped.

Where-as with Bluesky/ at protocol, most folks are on Bluesky servers, yes. But there's a very strong credible exit case where you can leave the Bluesky servers & just do your own thing. And follow whomever you want to follow.

Bluesky / at proto creates a trust mechanism beyond DNS, creates an identity that can be moved around between hosts or replicated outwards in a verifiable way. I dig ActivityPub, and have been a long time http enjoyer, but it's not ideal imo for social media to need to be so coupled to such strongly DNS based client-server systems.

It was a pretty huge disruption, though, even though the damage wasn't fatal.

For smaller semi-private circles IRC especially with web front-ends that provide scroll-back are still great but in my experience when they get too big there is just too much politics and too many cultural differences and things start to fall apart much like agilob's example. IRC is still great for groups of like minded people that do not need to bring in the entire world into their tent. In the early internet more people were like or similar minded and so it worked well for the most part. Some would even argue it still works great for the general public but I am not so sure. Keeping web interfaces semi-private with simple-auth and disabling referrers reduces the risk of botters, agent provocateurs and other forms of riff raff trying to sow division among people, or third party AI bots snooping or arguing.

NNTP is also great but most people can not afford individually to mirror entire binary groups and most ISP's no longer perform this so most people just use commercial news feeds if they want binaries or one of the free NNTP / Usenet providers if they are just using text. People can certainly peer with some of the free providers [1] and probably should to reduce the risk of people being censored. Much like IRC people can create their own little private or semi-private linked NNTP servers to replicate a distributed thread based forum of sorts.

[1] - https://www.eternal-september.org/index.php?showpage=peering

Should be easy enough to do the math: https://netsplit.de/networks/

I'd like to see Nostr added to this, since userbase concentration is always the thing they levy against the fedi model. It'd be a little weird to adapt because user identities don't live on single relays.

It would be weirdly represented since most clients push to multiple relays while the account so to speak is a public key pair on the user's device.

Keep the needle pointing north. Towards the center of that dial.

Too decentralized, and you can't find anything. Nobody uses it.

Too centralized, and censorship takes over. Nobody can speak freely.

I don't disagree but I do wonder if a.) discoverability is really so intractable in a decentralized environment if you're willing to throw a lot of resources towards indexing and b.) if that middle ground isn't like balancing a pendulum upside down - a very fragile equilibrium. A bunch of decentralized units might join together, or a large centralized unit might fail, pushing the pendulum to either side.

You can think of the golden age of blogs and search as an example of both. Search engines formed a centralized hub with blogs, forums, etc. forming the spokes. For a while that worked well before it was degraded by spam and consolidation of disparate forums etc. into a handful of major platforms (fueled partly be acquisitions).

That's a good point. Thiel's "Zero to One" makes it.

In economics, a market needs several reasonably strong businesses to get price competition. An EU study indicated that the minimum number is about 4. Below 4, price competition seems to disappear and you have oligopoly, or, at 1, monopoly.

In areas where there's no inherent effect like distance to stop centralization, markets tend towards oligopoly. Look at the number of browsers, the number of big banks, the number of cellular phone companies, and so forth. They're all between 2 and 4. The stable state seems to be around 3 big players.

This probably applies to social networks. There's only so much attention available.

The fediverse folks are violently against any efforts for discoverability. They like the high bar for discovering and joining. Any attempt gets brigaded and shut down quickly

They're against opt-out discovery, not discovery in general.

Given how toxic big tent communities like Twitter can get, I think that makes perfect sense for some communities. Some plants thrive in full sun, some plants thrive in the shade. Some social interactions happen in the town square and some happen at more intimate functions.

Some are. That doesn't mean they can do anything about it.

Remember, "the fediverse" is a bit like saying "the internet". "Internet folks are against centralization." Are they?

Fediverse makes the indexing question interesting because some people don't want it deeply indexed: they point to the practice of dredging up old opinions on Twitter as an anti-pattern that the tooling should not support. Indexing without permission is met with hostility / defederation over there, and both individuals and server owners have tools to switch fine-grained indexing on and off.

(It is, of course, fundamentally impossible to keep people from indexing a default-open network, but if one does it, one does not advertise doing it outside the service-supported mechanisms).

You're presupposing that discoverability requires some degree of centralization.

We haven't seen a distributed Google yet.

It's not impossible, but each distributed component would have to be at least a small data center.

Google search does a lot more complex processing (crawling, historical weighting) and it does it on unstructured pages. Discoveravility in a microblogging system doesn't have to be a lot more than indexing/collation of structured posts, and end-user clients can be designed to participate in that. "Retweets"/"boosts" are already a form of that.

Sure, but fediverse numbers are pitiful at this point. Reality is that 99.9% of users don't care about decentralization, so it ends up being a "this has to work as well as a centralized system" does.

> We haven't seen a distributed Google

https://yacy.net

You can index your own stuff and propagate that index to others. This is what people would have thought good when Gopher was a thing.

> In economics, values below 100 are considered "Highly Competitive", below 1500 is "Unconcentrated", and above 2500 is considered "Highly Concentrated".

Fediverse is almost straight left, and it's already 690. Straight up would be 5000. This is non-linear scale presented linearly.

Informed Voluntary choice is what I want to see. Give me the option to pick, centralized, decentralized or hybrid balanced.

“Too decentralized” critics should form a non-profit where public hosts can register and build an opt-in index for all the decentralized content. Then there is no discoverability issue.

I wouldn’t be surprised if Facebook tries to eventually capture that data with Threads.

Mastodon (the company) is building one: https://www.fediscovery.org/

Yes, I think that would be smart of them.

Including Threads with Fediverse might have an interesting impact.

adding Threads's 400M users changes the ActivityPub fediverse centralized market share to 99.72%, beyond that of BlueSky's share of 99.55%.

I agree. Even though the privacy control is opt in to share their posts with the fediverse, I think it's fair for Threads to be counted as a server in the fediverse holding user data. It just has more advanced privacy than others.

> This page measures the concentration of the Fediverse and the Atmosphere according to the Herfindahl–Hirschman Index (HHI), an indicator from economics used to measure competition between firms in an industry. Mathematically, HHI is the sum of the squares of market shares of all servers.

I had not heard of this metric before - it’s neat and simple to understand. If you scaled it down to 0-100 (by dividing by 100), I think it would make the numbers more immediately understandable. I’d even consider inverting it (so 0 = centralized and 100 = decentralized), since the website title implies measuring progress ‘towards’ decentralization.

OTOH the reason why they didn't normalize to 100 may be to not give people an idea that the measure is linear; seeing a score of 2500 makes you ask 'what does it mean?' whereas if you were presented with 25/100 you probably wouldn't think that it is 'highly concentrated'.

https://atproto.wiki/en/working-groups/indiesky/stack

multiple different levels of independence to be had in the atmosphere, so it's not directly comparable (doesn't help atmosphere's case though)

i personally am more excited about ATproto as a protocol, and hope https://freeourfeeds.com/ et al can pull it off

Mastodon has been great so far

So why not just wordpress and rss?

We've had the protocols and tech for decentralized services for 20+ years. The problem isn't the tech, it's the users! Your average Joe simply doesn't have 'must be decentralized' in their requirements. I'm guessing this is due to lack of education on the matter.

But, yes I agree with you :)

ActivityPub is basically just less simple RSS.

It would be great to add nostr, but nostr doesn't really match this model. Nostr doesn't need a single server to hold your identity, your app connects to many "relays" at the same time.

Isn't this always a problem of Marketing + UX?

Can someone explain how Fediverse instances or any decentralized platforms deals with most obvious issues of decentralization? Like if I join some node then one day it's offline and all my data lost. I mean surely instances usually don't disappear without notice but it still a totally possible thing. Or what about those times when entire instances get involved in dramas and end up defederating? I don't want to lose my connection with potential friends from such instance just because of something like that. I even remember some time ago when there were news about Threads being federated some people deliberately gathered "signs" across lots of instances to collectively defederate from Threads and it's just ridiculous to me that this even happened. What all this talk about fighting censorship while actively engaging in constant "self-censorship" making own echo-chamber as tight as possible?

> Like if I join some node then one day it's offline and all my data lost. I mean surely instances usually don't disappear without notice but it still a totally possible thing.

This happened to me with julialang.social which just stopped running after the guy hired to host it was poached by Google and he lost all interest in the Julia language community. Lost everything. Not going to look back at activitypub as ATProto is the future for me.

In ATProto, this is solved by not having "instances". Your data is stored on a personal server (which can ofc go down). However, your identity isn't hard-tied to that server. If you have a backup (making those more easily available is being worked on by people in the ecosystem), you can restore it on another host and point your identity there instead. So there's one level of indirection which prevents the problem you're describing.

Additionally, since ATProto decouples hosting of data from hosting of applications, there is no such things as "instances" having beefs and defederating from each other. The data flows one way (from PDS to apps). Apps may choose to ban certain PDS's but generally PDS's themselves are treated as app-agnostic containers of data. So intra-network social beefs don't translate to technological cutoff or loss of mobility. Social groups and communities are decoupled from hosting.

> Like if I join some node then one day it's offline and all my data lost

They just disappear. You're not appending to a long-term immutable blockchain-like archive. You're posting things to the internet and the nature of posting things to the internet (or anywhere for that matter) is that nobody guarantees they will be around forever.

> Or what about those times when entire instances get involved in dramas and end up defederating

Identical to "what about those times when Twitter bans someone whose tweets I want to read?"

> What all this talk about fighting censorship while actively engaging

Censorship and moderation are the same word

Depressing

"Decentralization" isn't the end-goal so measuring it here isn't all that meaningful. Personally, I care about:

1. How hard is it to censor the network.

2. How hard would it be for some major player to enshittify the network.

Furthermore, while the fediverse has a single axis for decentralization, BlueSky has 3: number of "big index servers", number of PDSs, number of domain names (how many people own their handle):

1. Increasing the number of PDSs doesn't make it harder to censor the network when everyone still uses the same big index node.

2. BlueSky's primary defense against enshittification is user account portability. I'd love to see metrics on how many users have their own domain names. Having many PDSs is also a good defense here because it reduces the impact of BlueSky (the company) shutting off the firehose, but I still think account portability is the primary defense here.

BlueSky's primary defense against enshittification is user account portability.

I wonder if people would actually migrate or if they'd just get boiled.

Depends on if they had something compelling to migrate to.

I made a Bluesky account. I posted some pretty boring replies to pretty boring posts, and still got banned. They didn't tell me why I was banned. I didn't bother trying to make my account immune from banning; I just quit since it was pretty boring there anyway. That's one data point against the migration hypothesis.

Bluesky isn't decentralized, anyway, because of the PLC directory.

I think this compares AT proto PDSes to Fediverse instances. In many ways this actually underplays just how grossly centralized AT proto currently is, since some of the components are 100% centralized still. (Whereas Fediverse instances, all downsides considered, are at least self-contained fully-independent instances.)

Which components are you referring to?

In case of Bluesky, there will also only ever be a single instance of the App View. As far as I am aware though in practice there's really only the official relay or indexing services either.

Why can't someone run another App View?

You can, it just won't be the Bluesky App View, so it's not really Bluesky. It's not like the fediverse where the instances own the URLs of the posts: they go through app views and there's a canonical URL to a Bluesky post and that's through the official app view.

This seems misleading.

Different AppViews would obviously be branded differently, but the whole point of ATProto is that there is a shared "picture" of the world. People are running alternative AppViews that consume Bluesky posts (and serve Bluesky threads).

Here's the same thread on three different AppViews:

- https://zeppelin.social/profile/did:plc:iyz5zf463ic52vqbonyu...

- https://blacksky.community/profile/did:plc:iyz5zf463ic52vqbo...

- https://bsky.app/profile/did:plc:iyz5zf463ic52vqbonyu2ebu/po...

These are three independent webapps indexing the same information and serving it independently. They're not different frontends for one API; these are all independent backends.

So, it's the same underlying data structures (e.g. posts, threads, etc.), and the way they're exposed depends on the implementation? So there's one BlueSky, but BlueSky is just one interface (UI + API). Am I getting this right?

I just want to know if I can run my own node in my own hardware.

You can think of ATProto like this.

Each user has a "website" with JSON of their own content (e.g. all my posts, all my likes, all my follows, actually live in a sqlite database hosted somewhere). It's not really a website but more like a git repo — one per user.

And then, there's a protocol for how to aggregate information from all such "websites" in the network into a stream of changes. Apps subscribe to that stream of changes and update their local databases (which act as app-specific caches) in response to those events.

When I make a Bluesky post, I'm really writing JSON into my sqlite file. This change gets broadcasted to all interested apps which update their own databases (which may or may not care about specific content type like "Bluesky post"). Obviously forks of Bluesky backend do index Bluesky posts (and then return them in the same UI), but you could imagine other backends that only care about other content types, or that record Bluesky posts but in a different database structure, and ofc can present a different UI for it.

Yes, you can run your own node — multiple types of nodes. You run your own PDS (https://github.com/bluesky-social/pds) to store own data (that's the "website" in my analogy), or you could run a Relay (https://whtwnd.com/bnewbold.net/3lo7a2a4qxg2l) that collects all PDS changes into a stream, or you could run an AppView (any backend that listens to Relay or PDS, basically your own app).

one important correction is that blacksky is not hosting their own appview, they host their own PDS for users to join and a soft-fork of the official client

Ah, apologies, thanks! Lemme edit.

Actually it doesn't look like I'm able to edit anymore, so I'll upvote your comment.

OK, to be honest, I'm surprised anyone runs a third-party Bluesky AppView even just as a curiosity. I genuinely can't understand why anyone would run one of these except as a raw curiosity.

If 99% of the people are using the default AppView, the default relay, the default indexers, the default PDSes, etc, etc... that just means that everything that almost the entire userbase sees is completely controlled by one entity. It's technically possible for people to use alternative services, but the community would have to wrestle the majority control of the network away from Bluesky Social PBC for it to really matter. Running an alternate PDS or even AppView seems like mostly a symbolic gesture since whether anyone can actually see your posts is still up to the whims of one entity, just like Twitter, and that's partly because there's no way to really "own" the URLs of your posts or profile. The canonical URLs are one domain owned by one company. The others are just alternatives.

But:

> the whole point of ATProto is that there is a shared "picture" of the world

I think everyone does understand that ATProto "solves" some of the problems with decentralization that you can observe from the Fediverse, but when you look at the practical reality of ATProto, it's hard to figure out exactly what aspect of decentralization users are supposed to be able to still benefit from. The whole thing could be re-centralized and literally 99% of all users wouldn't notice anything different. If you get censored by the entity running the primary AppView, or even deeper, you could theoretically run all of your own components... but then you'd just be talking to pretty much yourself. Even if you did succeed and somehow wrestled away a substantial portion of users, (which would be extremely expensive and impractical), now you just have the same split world that exists in the Fediverse, but with AppViews/moderation services. It kinda seems like the "shared picture of the world" concept is actually somewhat incompatible with having an actual decentralized network where users meaningfully have control.

P.S.: I know that mentioning censorship is automatically polarizing, but with Bluesky I really feel like I have good reason. I tried Bluesky briefly a long while back just out of raw curiosity, and I actually managed to get my account taken down with zero posts. I literally was just following some artists, mostly Japanese, and I assume one of them got banned for something NSFW. I'm not even sure I liked any posts that were NSFW. Needless to say once I got unbanned I just deleted my account and gave up on it. I wasn't really planning on using it for anything, so it's not like I am horribly offended by this, but it definitely gave me an idea of how Bluesky Social PBC moderates. No thanks.

there is https://deer.social

This one isn't an AppView (it's a frontend for Bluesky's). However, https://blacksky.community/ and https://zeppelin.social/ are full AppViews.

I thought Bluesky was federalized? How is it not?

The federal part doesn't actually work in practice. It's just a marketing gimmick.

This is factually wrong, and disproven by the fact there are now fully independent federated instances such as BlackSky and soon to be NorthSky. Furthermore, they have independent codebases which are fully compatible. Compare to ActivityPub where most instances are just running Mastodon or some close fork or risk breaking compatibility. What's the point of federation if you are stuck with a monoculture of implementations?

The main BlueSky services are still by far the most popular, which is why we see centralization on the network.

Mastodon is definitely not the only fediverse setup that is popular, Misskey, Pleroma and forks of those integrate perfectly well. Given that the main Misskey instance is one of the largest fediverse instances (certainly by activity) it seems a bit unfair to criticize the fediverse on this. I mean, how many completely independent microblogging implementations does a network actually need? (Not even including things like Lemmy or Peertube which are also ActivityPub instances.)

On the other hand I really think you're underselling how much more popular Bluesky services are than any existing alternatives. I don't think we can actually see the distribution of network traffic, but I would be willing to bet decent money that the sum of all alternatives to the Bluesky AppView wouldn't even crack 0.01% the traffic of the main Bluesky AppView. And, honestly, I would probably bet even more money they'll never even come close to cracking 1% ever for the entire lifetime of the protocol, unless Bluesky Social PBC literally goes out of business.

It is, but most people use the central server