Re. password... we landed on requiring a password to create the room to let you retain the "HOST" status. E.g., if you disconnect and go back ... need password to authenticate as HOST so that you won't lose your room to the first stranger that finds it empty
But all rooms are open. Password is just a "Recovery password". I will update now the description text to this
I work a lot on testing also SWE bench verified. This benchmark in my opinion now is good to catch if you got some regression on the agent side.
However, going above 75%, it is likely about the same. The remaining instances are likely underspecified despite the effort of the authors that made the benchmark "verified". From what I have seen, these are often cases where the problem statement says implement X for Y, but the agent has to simply guess whether to implement the same for other case Y' - which leads to losing or winning an instance.
I'm usually very supportive of EU tech regulation, but to be honest I don't really want to put my name and address up on apps I throw up on the store
Would like to keep my identity separate to whatever projects I have usually, especially if they're ones that don't 100% align with the your own developer brand that employers might screen for
I have the same mentality as you. But, rather than form an opinion on whatever EU regulation is being interpreted as "requiring" these steps from Google et al, I think I'm going to assert that it's a red herring.
The real issue, IMO, is that it's still too hard to distribute and install applications on my general-purpose computing devices! You can't be on Google's app store if you aren't a "real business" with a physical address and everything? Fine. Let's just distribute our apps on F-Droid, or by just releasing APKs in our GitHub pages, etc.
At least that's still possible with Android. But who knows how much longer they'll even allow that?
Yeah, if you have a market that can be installed by the user without passing through a marketplace. The EU regulation gets blamed, but that's not the actual issue.
From what I can tell, this all should apply only to monetized apps (and I agree with that). If that's not actually the case, Google is using malicious compliance to misguide developers into hating the EU for daring to regulate them.
That's probably where F-Droid is a better choice in the first place ?
Google Play (and the App store) assume by default commercial intent, and I'm sympathetic to stricter verification rules when there's money changing hands.
> I don't really want to put my name and address up on apps I throw up on the store
As a customer I really want the ability to sue someone who does me wrong, call them out publicly, or at least avoid their products. In no way is it reasonable that someone should want to stay anonymous while selling me something (or profiting off of it in one way or another). I really don't see a reason to make an exception for people who have free+offline+etc apps.
You're publishing software, you need to be identifiable.
It is always true that people love regulation until it ends up affecting them negatively, through the magic of unintended consequences and emergent phenomenon in complex systems like human societies and economies.
> I wanted to write about the abandoned kasbah of Foum Zguid yesterday,
I know this sounds trite but please do actually post this I'd love to read it. With this kind of post, if anyone learns about it it's far more likely they've learned about it from your blog than having requested info about it directly from AI.
I've a million misgivings about the future with AI and the people who control and benefit from it, but I do still think one valuable thing that will be able to survive is humans ability to direct attention and bring value to otherwise neglected stories like what you're wanting to write about there.
50+ hours on 256 H100s is considered impressively low training?
Really makes me wonder if any of this incredibly computationally expensive research is worth it, which seems only useful in potentially promising a future in which humans are given less opportunity to express themselves creatively - while delivering them an infinitely produceable amount of ai generated 'content' to passively consume
Compared to the resources costs for humans to prop up the industry, a handful of DCs that can do this and still improve is cheap.
Work phones, laptops, personal stuff. We duplicate a lot of resource use for one person to have a career.
There will still be pencil and paper. There’s still creative things to do. Do we even get that these days? Where’s our generations LOTR or Star Wars? Yep just prequels and sequels of same old.
Are we that creative copy-pasting and git pull deps someone else maintains? IT is librarian work these days. Little in the day to day is novel creativity.
Your argument is not a compelling one. Feels like hand wavy nod to a human soul, while ignoring we all complain about soul crushing jobs capturing so much of our agency, sucking fun out of life since it’s just the same todos different day… not that creative and we tacitly notice and complain but keep doing.
It’s a really lame circular routine and lived experience being around my peers these days; oh I hate my job but this new thing is an abomination and affront to my chosen job. I’m gonna be someone someday! Don’t take it away! Unicorn! Disrupt!
I have no idea what you're talking about. I've tried to understand where you're coming from with this and the only logical conclusion I can make is that you spend a lot of time engaging in debate about creativity and art as it relates to new AI technology, and you are simply re-igniting previous debates instead of engaging with me.
>It’s a really lame circular routine and lived experience being around my peers these days; oh I hate my job but this new thing is an abomination and affront to my chosen job.
It sounds like you're arguing with your peers, and not me, because I don't hate my job and I don't think AI is going to replace it any time soon.
>Are we that creative copy-pasting and git pull deps someone else maintains? IT is librarian work these days. Little in the day to day is novel creativity.
This isn't what I do at my day job, and if that's what you do... I think I have a good idea of why you interact with the internet like this.
Because this will definitely be used only to innocently tell off people doing 1/10 the work of everyone else, and not micromanage and hound people to increasingly unrealistic standards in already desperate conditions.
Safe to say you aren't in any position where every move you make will be watched by AI and analysed for faults so that your boss can scream at you more efficiently whenever you don't meet standards for their pitiful wages.
It's also dumb from a factory prospective. Our factories did time studies to understand things. What we learned:
Certain lines are primarily made up of barely functioning older people. No one else sticks around in those jobs. Think barely functioning alcoholic or recovering alcoholics that have nothing. However we would also get a few 18 year olds with no idea how jobs/work works and or zero accountability (they just ghost jobs).
From the numbers we should want to build our processes around the high performers. But we can't expand our base of high performers AND they are the most likely to just disappear and not easily have their productivity replaced.
So yes, it was correct that 10% of our people outperformed by 10X, and yes, it was smart to not try to improve that but to understand reality.
> From the numbers we should want to build our processes around the high performers. But we can't expand our base of high performers AND they are the most likely to just disappear and not easily have their productivity replaced.
You're failing to retain high performers? Are there perhaps methods for retaining high performers that you have not tried?
AI for Executive performance monitoring would be an interesting social experiment.
Do you really think this tool is making folks micromanage and abuse employees or perhaps they already would be doing that and this tool helps it?
There can be real value in these types of tools, its ultimately up to the implementation and I don't believe this tool will somehow make a happy work environment into an abusive one, the abuse will have most likely already existed.
>or perhaps they already would be doing that and this tool helps it?
Yes. I don't think we should ethically encourage the abuse of workers. And that official lens of marketing can and will shape who reaches out, even if the tool can indeed be used ethically. Framing is key.
As a tame example: think of the graphic design and marketing of red bull vs Monster. They have the same basic ingredients and purpose but that simple red bull design vs the in-your-face punk-esque vibe of Monster will change who buys it, how they identify with it, and even alter the perception of how it tastes.
Absolutely I expect it to be used to micromanage and abuse. Yes those behaviours already exist that’s why I know a tool that enables them will amphifly them
picture this: corporate buys something (like O365), and is reluctant to end licensing for the bundle. So... if they're locked into a contract that includes management-abuse-as-a-service, enabling bad behaviors, do you think they'll back out of enabling that one abusive manager out of five? how will that impact the workforce?
Not OP but other than what core functionality they can demo to investors, every AI company seems to have extremely lacking:
- web design (basic features take years to implement, and when done break the website on mobile)
- UI/UX patterns (cookie cutter component library elements forced into every interface without any tailoring to suit how the product is actually used, also makes a Series C venture indistinguishable from something setup in a weekend)
- backend design (turns out they've been hemorrhaging money on serverless Vercel function calling instead of using Lambda and spending a minute implementing caching for repeat requests)
- developer docs (even when crucial to business model, often seems AI generated, incomplete, incoherent)
And this usually comes from hiring much less developers than is needed, and those that are hired are 10x Cursor/GPT developers which trust it to have done a comprehensive job at what seems like a functional interface on the surface, and have little frame of reference or training for what constitutes good design in any of these aspects.
> (turns out they've been hemorrhaging money on serverless Vercel function calling instead of using Lambda and spending a minute implementing caching for repeat requests)
Oh but why can't the AI do basic backend programming anymore? /s
I had difficulty getting my lemming to speak. After selecting several alternatives, I tried one with a more defined, open mouth, which required multiple attempts but mostly worked. Additional iterations on the same image can produce different results.
Intrigued by the project, love the idea of exhaustively exploring significance for arbitrary input.
Think you need to provide a few example inputs and outputs on the github for the program.
Also not sure a project focused on decoding meaning and signals benefits from having AI generated interpretations divorced from the inherently human act of sign interpretation. Can be seen in the md file, such a rigidly enforced structured output has forced it to give some averaged amount of weight to different categories and examples, when many are facets of each-other, or purely just expressions of something mentioned earlier. I can see free-form high-temperature llm outputs fed to another model, which serves only to aggregate their core interpretations, providing more insight than what's within the document currently.
Also seems to require a password to host a room (can't just leave it open?)