More

trunch · 2025-11-29T07:54:22 1764402862

Not able to join any room on Chrome and Windows

Also seems to require a password to host a room (can't just leave it open?)

473999 · 2025-11-29T13:43:14 1764423794

Thank you for the report, we'll look into that.

Re. password... we landed on requiring a password to create the room to let you retain the "HOST" status. E.g., if you disconnect and go back ... need password to authenticate as HOST so that you won't lose your room to the first stranger that finds it empty

But all rooms are open. Password is just a "Recovery password". I will update now the description text to this

trunch · 2025-11-18T13:47:11 1763473631

Which of the LiveCodeBench Pro and SWE-Bench Verified benchmarks comes closer to everyday coding assistant tasks?

Because it seems to lead by a decent margin on the former and trails behind on the latter

veselin · 2025-11-18T14:38:43 1763476723

I work a lot on testing also SWE bench verified. This benchmark in my opinion now is good to catch if you got some regression on the agent side.

However, going above 75%, it is likely about the same. The remaining instances are likely underspecified despite the effort of the authors that made the benchmark "verified". From what I have seen, these are often cases where the problem statement says implement X for Y, but the agent has to simply guess whether to implement the same for other case Y' - which leads to losing or winning an instance.

Snuggly73 · 2025-11-18T14:18:03 1763475483

Neither :(

LCB Pro are leet code style questions and SWE bench verified is heavily benchmaxxed very old python tasks.

trunch · 2025-04-30T20:18:02 1746044282

I'm usually very supportive of EU tech regulation, but to be honest I don't really want to put my name and address up on apps I throw up on the store

Would like to keep my identity separate to whatever projects I have usually, especially if they're ones that don't 100% align with the your own developer brand that employers might screen for

ragnese · 2025-04-30T20:42:34 1746045754

I have the same mentality as you. But, rather than form an opinion on whatever EU regulation is being interpreted as "requiring" these steps from Google et al, I think I'm going to assert that it's a red herring.

The real issue, IMO, is that it's still too hard to distribute and install applications on my general-purpose computing devices! You can't be on Google's app store if you aren't a "real business" with a physical address and everything? Fine. Let's just distribute our apps on F-Droid, or by just releasing APKs in our GitHub pages, etc.

At least that's still possible with Android. But who knows how much longer they'll even allow that?

braiamp · 2025-04-30T21:36:34 1746048994

Yeah, if you have a market that can be installed by the user without passing through a marketplace. The EU regulation gets blamed, but that's not the actual issue.

LPisGood · 2025-04-30T21:36:21 1746048981

I think the issue may be thinking of your phone, running a non-open OS, as a general-purpose computing device.

whimsicalism · 2025-04-30T21:48:05 1746049685

Presumably F-Droid is subject to the same regulatory requirements, so in this case it is directly the regulation to blame.

iAMkenough · 2025-04-30T21:53:07 1746049987

F-Droid isn’t in the same business, and doesn’t sell apps, so it’s not subject to the same regulatory requirements.

o11c · 2025-04-30T23:05:05 1746054305

F-Droid has apps with the "ads" anti-feature, so this probably applies to them.

iAMkenough · 2025-05-01T12:47:36 1746103656

I think it’d apply to the app owner. F-Droid isn’t in the advertising business either, doesn’t get any revenue.

That feature flag just changes what is allowed to appear in search results.

whimsicalism · 2025-05-01T16:21:07 1746116467

The DSA applies to

> all online intermediaries and platforms operating within the EU

o11c · 2025-04-30T23:13:59 1746054839

From what I can tell, this all should apply only to monetized apps (and I agree with that). If that's not actually the case, Google is using malicious compliance to misguide developers into hating the EU for daring to regulate them.

makeitdouble · 2025-04-30T22:50:32 1746053432

That's probably where F-Droid is a better choice in the first place ?

Google Play (and the App store) assume by default commercial intent, and I'm sympathetic to stricter verification rules when there's money changing hands.

colechristensen · 2025-04-30T22:53:48 1746053628

> I don't really want to put my name and address up on apps I throw up on the store

As a customer I really want the ability to sue someone who does me wrong, call them out publicly, or at least avoid their products. In no way is it reasonable that someone should want to stay anonymous while selling me something (or profiting off of it in one way or another). I really don't see a reason to make an exception for people who have free+offline+etc apps.

You're publishing software, you need to be identifiable.

umbra07 · 2025-05-01T01:42:14 1746063734

> You're publishing software, you need to be identifiable.

"Because I want to be able to sue you" is not a particularly compelling line of reasoning for legislating incredibly invasive laws.

xdfgh1112 · 2025-04-30T23:28:03 1746055683

This punishes the people who release apps for free or open source. For the money generating app farms it doesn't slow them down at all.

hulitu · 2025-05-01T13:28:55 1746106135

> As a customer I really want the ability to sue someone who does me wrong,

John Doe ?

Ferret7446 · 2025-05-02T09:08:51 1746176931

It is always true that people love regulation until it ends up affecting them negatively, through the magic of unintended consequences and emergent phenomenon in complex systems like human societies and economies.

trunch · 2025-04-26T18:24:11 1745691851

> I wanted to write about the abandoned kasbah of Foum Zguid yesterday,

I know this sounds trite but please do actually post this I'd love to read it. With this kind of post, if anyone learns about it it's far more likely they've learned about it from your blog than having requested info about it directly from AI.

I've a million misgivings about the future with AI and the people who control and benefit from it, but I do still think one valuable thing that will be able to survive is humans ability to direct attention and bring value to otherwise neglected stories like what you're wanting to write about there.

trunch · 2025-04-08T13:02:10 1744117330

50+ hours on 256 H100s is considered impressively low training?

Really makes me wonder if any of this incredibly computationally expensive research is worth it, which seems only useful in potentially promising a future in which humans are given less opportunity to express themselves creatively - while delivering them an infinitely produceable amount of ai generated 'content' to passively consume

skyyler · 2025-04-08T13:57:07 1744120627

>Really makes me wonder if any of this incredibly computationally expensive research is worth it

I'm wondering the same thing. 256 H100s were hot for two days straight to be able to make short clips of cartoons that almost don't look like shit?

It just isn't compelling to me.

burgrkng · 2025-04-08T18:55:53 1744138553

Compared to the resources costs for humans to prop up the industry, a handful of DCs that can do this and still improve is cheap.

Work phones, laptops, personal stuff. We duplicate a lot of resource use for one person to have a career.

There will still be pencil and paper. There’s still creative things to do. Do we even get that these days? Where’s our generations LOTR or Star Wars? Yep just prequels and sequels of same old.

Are we that creative copy-pasting and git pull deps someone else maintains? IT is librarian work these days. Little in the day to day is novel creativity.

Your argument is not a compelling one. Feels like hand wavy nod to a human soul, while ignoring we all complain about soul crushing jobs capturing so much of our agency, sucking fun out of life since it’s just the same todos different day… not that creative and we tacitly notice and complain but keep doing.

It’s a really lame circular routine and lived experience being around my peers these days; oh I hate my job but this new thing is an abomination and affront to my chosen job. I’m gonna be someone someday! Don’t take it away! Unicorn! Disrupt!

skyyler · 2025-04-09T15:58:40 1744214320

>Feels like hand wavy nod to a human soul

I have no idea what you're talking about. I've tried to understand where you're coming from with this and the only logical conclusion I can make is that you spend a lot of time engaging in debate about creativity and art as it relates to new AI technology, and you are simply re-igniting previous debates instead of engaging with me.

>It’s a really lame circular routine and lived experience being around my peers these days; oh I hate my job but this new thing is an abomination and affront to my chosen job.

It sounds like you're arguing with your peers, and not me, because I don't hate my job and I don't think AI is going to replace it any time soon.

>Are we that creative copy-pasting and git pull deps someone else maintains? IT is librarian work these days. Little in the day to day is novel creativity.

This isn't what I do at my day job, and if that's what you do... I think I have a good idea of why you interact with the internet like this.

altcognito · 2025-04-08T16:18:18 1744129098

So, costs roughly 15k?

skyyler · 2025-04-08T18:44:13 1744137853

$15k for some clips of tom and jerry that almost look passable. What a deal.

quantumHazer · 2025-04-08T16:27:38 1744129658

Sorry, you're right lol. I'm just accustomed to other major lab gazillions of hours of training.

trunch · 2025-02-25T13:13:17 1740489197

Because this will definitely be used only to innocently tell off people doing 1/10 the work of everyone else, and not micromanage and hound people to increasingly unrealistic standards in already desperate conditions.

Safe to say you aren't in any position where every move you make will be watched by AI and analysed for faults so that your boss can scream at you more efficiently whenever you don't meet standards for their pitiful wages.

_DeadFred_ · 2025-02-25T21:09:20 1740517760

It's also dumb from a factory prospective. Our factories did time studies to understand things. What we learned:

Certain lines are primarily made up of barely functioning older people. No one else sticks around in those jobs. Think barely functioning alcoholic or recovering alcoholics that have nothing. However we would also get a few 18 year olds with no idea how jobs/work works and or zero accountability (they just ghost jobs).

From the numbers we should want to build our processes around the high performers. But we can't expand our base of high performers AND they are the most likely to just disappear and not easily have their productivity replaced.

So yes, it was correct that 10% of our people outperformed by 10X, and yes, it was smart to not try to improve that but to understand reality.

tmpz22 · 2025-02-26T04:45:21 1740545121

> From the numbers we should want to build our processes around the high performers. But we can't expand our base of high performers AND they are the most likely to just disappear and not easily have their productivity replaced.

You're failing to retain high performers? Are there perhaps methods for retaining high performers that you have not tried?

AI for Executive performance monitoring would be an interesting social experiment.

infecto · 2025-02-25T13:35:23 1740490523

Do you really think this tool is making folks micromanage and abuse employees or perhaps they already would be doing that and this tool helps it?

There can be real value in these types of tools, its ultimately up to the implementation and I don't believe this tool will somehow make a happy work environment into an abusive one, the abuse will have most likely already existed.

johnnyanmac · 2025-02-25T18:48:11 1740509291

>or perhaps they already would be doing that and this tool helps it?

Yes. I don't think we should ethically encourage the abuse of workers. And that official lens of marketing can and will shape who reaches out, even if the tool can indeed be used ethically. Framing is key.

As a tame example: think of the graphic design and marketing of red bull vs Monster. They have the same basic ingredients and purpose but that simple red bull design vs the in-your-face punk-esque vibe of Monster will change who buys it, how they identify with it, and even alter the perception of how it tastes.

latexr · 2025-02-25T14:23:46 1740493426

> and this tool helps it

And that is bad.

> the abuse will have most likely already existed.

The abuse will get worse. The correct ethical answer to “the conditions are bad” is “improve the conditions” not “make them worse”.

rimbo789 · 2025-02-25T14:18:52 1740493132

Absolutely I expect it to be used to micromanage and abuse. Yes those behaviours already exist that’s why I know a tool that enables them will amphifly them

BriggyDwiggs42 · 2025-02-26T16:11:45 1740586305

The incentive is almost universally to micromanage and abuse employees. Reducing the friction will increase the incidence.

pluto_modadic · 2025-02-25T16:43:21 1740501801

picture this: corporate buys something (like O365), and is reluctant to end licensing for the bundle. So... if they're locked into a contract that includes management-abuse-as-a-service, enabling bad behaviors, do you think they'll back out of enabling that one abusive manager out of five? how will that impact the workforce?

mjmsmith · 2025-02-25T18:46:37 1740509197

Maybe the implementation could include an option to show the worker's name.

trunch · on Oct 7, 2024

Only on HN could Lex Fridman's endorsement mean anything when it comes to an IDE

trunch · on Oct 7, 2024

Not OP but other than what core functionality they can demo to investors, every AI company seems to have extremely lacking:

- web design (basic features take years to implement, and when done break the website on mobile)

- UI/UX patterns (cookie cutter component library elements forced into every interface without any tailoring to suit how the product is actually used, also makes a Series C venture indistinguishable from something setup in a weekend)

- backend design (turns out they've been hemorrhaging money on serverless Vercel function calling instead of using Lambda and spending a minute implementing caching for repeat requests)

- developer docs (even when crucial to business model, often seems AI generated, incomplete, incoherent)

And this usually comes from hiring much less developers than is needed, and those that are hired are 10x Cursor/GPT developers which trust it to have done a comprehensive job at what seems like a functional interface on the surface, and have little frame of reference or training for what constitutes good design in any of these aspects.

benreesman · on Oct 7, 2024

dawg I ChatGPT’d that license, busy building rn.

benreesman · on Oct 7, 2024

I was the guy trolling, downvote me.

Don’t downvote the person who submitted a substantial comment far more valuable than it’s GP.

raverbashing · on Oct 7, 2024

> (turns out they've been hemorrhaging money on serverless Vercel function calling instead of using Lambda and spending a minute implementing caching for repeat requests)

Oh but why can't the AI do basic backend programming anymore? /s

trunch · on Sept 6, 2024

Not robust enough to work against a sketch https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...

though perhaps it rebelled against the message

marginalia_nu · on Sept 6, 2024

https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...

xD

eth0up · on Sept 6, 2024

I had difficulty getting my lemming to speak. After selecting several alternatives, I tried one with a more defined, open mouth, which required multiple attempts but mostly worked. Additional iterations on the same image can produce different results.

andrew-w · on Sept 6, 2024

Cartoons are definitely a limitation of the current model.

trunch · on Aug 17, 2024

Intrigued by the project, love the idea of exhaustively exploring significance for arbitrary input.

Think you need to provide a few example inputs and outputs on the github for the program.

Also not sure a project focused on decoding meaning and signals benefits from having AI generated interpretations divorced from the inherently human act of sign interpretation. Can be seen in the md file, such a rigidly enforced structured output has forced it to give some averaged amount of weight to different categories and examples, when many are facets of each-other, or purely just expressions of something mentioned earlier. I can see free-form high-temperature llm outputs fed to another model, which serves only to aggregate their core interpretations, providing more insight than what's within the document currently.

spacebacon · on Aug 17, 2024

Thanks, I agree with that input.

What are your thoughts on this from a conceptual lens?

https://github.com/space-bacon/Semiotic-Analysis-Tool