Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it'll be fine. You don't need 7000 people to run a micro blogging service. It's really that simple.


> I think it'll be fine. You don't need 7000 people to run a micro blogging service. It's really that simple.

When you're generating and distributing and moderating multiple TB of tweets in real time every day to billions of people and also feeding other corporations parts of that data... maybe you do.


Twitter has around 190m daily active users. You're going to struggle to run something like that with an engineering team of 20-30 people (although WhatsApp did exactly that for many years), sure.

But I don't see why an company and an application of that size couldn't be run by a company of, say, 1500 people, instead of 7000. 250 engineering staff, 250 doing moderation/support, 800 doing sales & account management and 100 in management and 100 doing sundry tasks.


250 people moderating AND supporting 190m daily active users??

Let's do some napkin math here:

500 million tweets, let's say 10% are reported, and evenly distributed. That would make for reviewing 200,000 tweets per employee per day. That's 7 tweets that need to be moderated per second for a standard 8 hour day.

190m active users, .1% of which need support daily. On top of 7 tweets per second, that's just shy of three user support tickets per second that they need to manage as well.

And that's without weekends, holidays, sick days, etc.

EDIT: Let's go the other way with napkin math too.

Let's say each content moderator can review 6 tweets per minute.

50m tweets to be moderated at that velocity means you need around 136,000 man hours per day. For an 8 hour day, that's 17,000 employees. Too many.

So you make an ML algorithm that processes the reported tweets, but needs backup and spot checking on 10% of those. 1,700 content moderators (and a team dedicated to building/maintaining the ML report checking algorithm).

Now, about those support requests. Based on my experience from being a CSR for Amazon years ago, you'll probably want 2-3 minutes per support request, and that's if you're sending a form letter back. 1.9m requests at 20 per hour means some 95,000 man-hours per day, or 12k CSRs. Too many. Another ML algorithm (and team to maintain it) and we're down to 1,200 CSRs.

About 3 thousand employees plus their support structure, just to handle tweet reports and CSR requests.

Based on napkin math, 7,000 employees makes a lot of sense to me.


The idea that 10% of tweets are reported is a huge over estimate. I'd say at most 1% of tweets are reported, and it's probably more like 0.1%.

Twitters actual numbers (from https://transparency.twitter.com/en/reports/rules-enforcemen...) show that 11.6m reports were generated in the period July to December 2001, which is roughly 65,000 reports per day. ML could easily reduce this number further, but with 100 employees doing moderation, even without ML, that's 650 reports per day. That's getting towards doable.


Are those employees able to speak all the world's languages?


> ML could easily reduce this number further

Seeing how poorly ML works for moderation (Too many false positives), I don't think it belongs anywhere near it.

The problem is that you could offer a user a path to request a human review moderation action taken by ML, but bad actors that knowingly break rules will just request human review and at that point, the ML is worthless.


10% would also include automatically reviewed tweets for misinformation, covid, and any other language filters they have in place.

Not all reviewed-for-moderation tweets would happen because of user reports.


Isn't the grunt work of moderation mostly outsourced today anyway, meaning those employees aren't on Twitter's books?


Exploring this further. I don't have real numbers but I suspect yours are pretty far off.

10% reported seems high, possibly by as much as an order of magnitude. The overwhelming majority of tweets are vapid and innocuous.

It seems like an ML algorithm could do better than 90%. It doesn't have to be as perfect as driving a car; if it screws up occasionally it will merely annoy users (and probably the users that are most troublesome).

If twitter becomes a more permissive environment, less censorship is necessary. Agree or disagree with it, it means less work for the censors.

Paid subscriptions give you a significant new trust metric for users.

The rest you can farm out to mechanical turk?

250 seems within the realm of possibility. People will complain about you the same way they complain about Google, but they'll keep using your product.


I pointed this out in a sibling, but 10% would also include automatically reviewed tweets for misinformation, covid, and any other language filters they have in place.

Also, having worked for a company that did ML sentiment analysis and content analysis against the twitter firehose, the accuracy of ML was closer to 65% than it was 99%. Yes, the company had a huge in-house crew dedicated to checking that 35%, and monitoring twitter manually for anything missed.


You guys have never worked at a major social media platform and it shows. It takes around that many engineering staff just to run a passable advertising platform. There are so many technical nuances that you cannot even imagine.


For context, when I worked at _The Guardian_, roughly half our engineering staff worked in Commercial - doing things for advertisers


Maybe, maybe not. I don’t think the number of engineers needs to scale with the servers. You can scale out servers without scaling out your number of engineers. As far as moderation, Twitter is already poorly moderated and moderation can be outsourced.

The types of things that require scaling out engineers is supporting video embeds from 100s of different video hosting platforms.


I've seen this statement a lot, and I think it's missing something.

Public companies typically have growth as a goal. Twitter was not making a lot of money. Some of these people were working on building a future Twitter that would grow bigger and richer. I'm not saying they were on the path to success, just that I expect a lot of the activity there could be described that way.

Five years ago you could say of Uber 'You don't need 5000 people to run a freelance taxi app', when they had hundreds (thousands?) of people working on autonomous driving. Amazon didn't seem to need scores of backend engineers to run a web store, but now AWS is a huge business. Google employs vast numbers of people but has a relatively small number of impactful products, only some of which make money.

Again, not claiming they were doing it well, just that they were trying stuff beyond maintaining what we see.


To be frank, twitter hasn't changed in 10 years.

So in this hypothetical, the people building the future of the company have achieved nothing during this time and should absolutely be removed or replaced.


No argument there. Just pointing out that it's common for head count to be larger than the visible product justifies.

Uber cut their losses. Twitter needed a change of owner to get there.


i think 7000 is too high as well but don't forget, in a business, the technology part is maybe 10% of the overall effort. There's a lot that goes into running a business beyond the tech.


Yes, where would businesses be without a fleet of energy sucking middle managers.

I don’t like EM at all, but truth is internet companies are making so much goddamn money, they have employee pools absolutely full of turds.


It's possible you need 100 engineers and 6,900 moderators.


I believe a lot of the moderation is contracted out and isn't part of employee headcount.


Why do you think so?


Look at early WhatsApp until they were aquired.

Vastly more innovative and scaled crazy fast without fail whales or anything.

Or look to Telegram today. Delivering a vastly more complex product with a fraction of the company size it seems.


Aren’t WhatsApp and telegram messaging services rather than microblogging services?

The difference between a messaging app and a microblogging app is basically the difference between an O(n) problem and an O(n^2) problem.

If you don’t think that adds some complexity, you’re kidding yourself.


It adds some complexity, but consider that we know very well how to scale this type of service: E-mail + reflectors (mailing lists), and we know very well how to do parallel mass delivery for the small proportion of accounts with huge numbers of followers.

Scaling this is easily done with decomposition and sharding coupled with a suitable key->value mapping of external id to current shard. I first sharded e-mail delivery and storage for millions of users 23 years ago. It was neither hard nor novel to do then, with hardware slower than my current laptop handling hundreds of thousands of users each.


Those models are predicated on every user having an ‘inbox’

Do you believe that every Twitter user has an inbox stored on disk somewhere that just contains every tweet posted by someone they follow?


I have no idea if that is how Twitter ended up doing it. But building it that way is vastly easier to scale than trying to do some variation over joining the timelines of everyone you follow "live" on retrieval, because in models like this the volume of reads tends to massively dominate.

You also don't need to store every tweet, you need to store the id's of the tweets (a KV store of the tweet id to full tweet is also easy to shard), and since they're reasonably chronological the id's can be compressed fairly efficiently (quite a few leading digits of tweet id's are chronological).

You also have straightforward options for "hybrid" solutions, such as e.g. dealing with extreme outliers. Have someone followed by more than X% of total userbase? Cache the most recent N tweets from those accounts on that small set of timelinesyour frontends, and do joins over those few with users who follow them.

Most importantly, it's an extensively well tested pattern in a multitude of systems with follower/following graphs whenever consumers/reads dominate over a period of decades at this point, so behaviours and failure modes are well understood with straightforward, well tested solutions for most challenges you'll run into, which matters in the context of whether it'd be possible to build with a small team.

Put another way: I know from first hand experience you can scale this to millions of users per server on modern hardware, so the number of shards you'd need to be able to manage to deal with Twitter-level volume is lower than the number of servers I've had ops teams manage (you'd need more servers total, because your read load means you'd want extensive caching, as well as storage systems for e.g. images and the like - there's lots of other complexity, but scaling the core timeline functionality is not a complex problem)


It seems likely.


I might be underestimating how hard it is to scale microblogging. I most certainly am.

But have you looked at the scale of what Telegram provides both in width and at scale?

Certainly there are celebrities with more followers on Twitter than the largest Telegram channels, but Telegram scales surprisingly far, and I haven't seen it struggle more than once or twice since the start.


They're different problems, but messaging isn't some trivial thing. And a user having a single unified public view of their tweets is pretty much O(n).


Telegram is also a microblogging service. With far more users.

It's clear you are totally clueless about how many features telegram has.


Maybe you could provide links, examples, or helpful context instead of snark?


WhatsApp is not 'vastly more innovative', and they solved different kinds of problems.

Twitter is a 'universe of 100M connected people'.

WhatsApp mostly connected single entities together.

So, for example, 'real time search' and 'relevant updates'.

Imagine taking a firehose of 100M people's random thoughts, putting that into an index, making it instantly searchable. Now pull up the most relevant thoughts from those 100M to each and every other 100M user.

Now moderate all of it in really subtle ways, whereupon most of the 'negative activity' is tantamount to spam or annoying behaviour, and not anything we might normally consider 'abuse'.

That's an incredibly different challenge and that's only two small artifacts of what they are doing.

Twitter is not rocket science, but it's not trivial either.

Also consider that R&D is usually maybe on 20% of overhead - yes - it takes 'all those other jobs and expenses' to run a company.

Some of these statements are a bit glib.


> Look at early WhatsApp until they were aquired.

Look at what they didn't do, which was make any revenue, or support any customers, or moderate any content.


Wasn't WhatsApp on track to be cash flow positive?

I know I at least was shouting at them to take my money: it was the perfect HN product, reasonably priced, technically superior and with no ads or tracking.

And yes, I paid as well.


Twitter has ads serving infra, recommendation systems (timeline, notifications, events, users), user generated events, prediction systems (ads), user graphs. The complexity is from processing and persisting exabytes of data in company owned datacenters. eg. Twitter stores images, videos, user events, user data, tweets/replies. WhatsApp has little persistence outside of metadata maybe? But your messages are not stored in a FB datacenter and if they are I'd be concerned. You can read about their infra in their blog. Comparing p2p messaging versus a distributed social media site with mountains of data and years of iteration in ML systems does not make sense.

https://blog.twitter.com/engineering/en_us/topics/infrastruc...


Messaging services don’t need the same amount of moderation as micro blogging do.


I don't disagree that 7000 people is too many for what Twitter has become but Twitter has been at the bleeding edge in terms of building web and data systems that can handle scale (while also open sourcing most of that work).


Not sure how this is relevant given that Twitter is not a “micro blogging service”


What is it then? That seems like the best description to me.


It’s a messaging service that has a broadcast feature. And for that multicast/broadcast feature you get to have a crappy peer to peer message experience, be limited to small messages, look at ads, surf through unknown algorithm manipulation of what you read, locked in a single system, and be told you might need to pay to either a) prove you’re “real” or b) not see the ads. Sign me up!!

(A blog is also a broadcast messaging service, so I think you’re both right).


You don't if you don't need any content moderation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: