If you can reduce a problem to a point where it can be solved by simple code you can get the rest of the solution very quickly.
Reducing a problem to a point where it can be solved with simple code takes a lot of skill and experience and is generally still quite a time-consuming process.
Well, the major problem Microsoft is facing is that its AI products are not only shoddier than average, which is nothing new for them in many categories, but that this time the competition can actually easily leapfrog them.
Like, I have a 'Copilot' button prominently displayed in my New Outlook on MacOS (the only platform where the app-with-that-designation is sort-of usable), and it's a dropdown menu, and it has... zero items when expanded.
I asked my 'Microsoft 365 Bing Chat AI Bot Powered By ChatGPT<tm>' about that, and it wasn't able to tell me how to make that button actually do something, ending the conversation with "yeah, that's sort-of a tease, isn't it?"...
Oh, well, and I actually also have a dedicated Copilot button on my new Lenovo laptop powered-by-Windows-11. And, guess what, it does exactly nothing! I can elect to either assign this button to 'Search', which opens a WebView2 to bing.com (ehhm, yeah, sure, thanks!) or to 'Custom', in which case it informs me that 'nothing' meets the hardware requirements to actually enable that.
So, my question to anyone in the Microsoft C-suite: have you ever tried to, like, actually use, like anything that you're selling? Because if you would have, the failings would have been obvious, right? Right??
Had the cost of building custom software dropped 90%, we would be seeing a flurry of low-cost, decent-quality SaaS offering all over the marketplace, possibly undercutting some established players.
From where I sit, right now, this does not seem to be the case.
This is as if writing down the code is not the biggest problem, or the biggest time sink, of building software.
Here in NZ, pretty much all medium/large businesses and govt departments have gone all-in with M365. Most govt departments are on the E5 licence, and have also started to roll out the Copilot licences too.
The cost and complexity and the effort required to switch away from M365 is massive. It's not just using a different version of Excel and Word - that's the least of the issues. It's all the data stored in SharePoint Online, the metadata, permissions, data governance, etc. It's the Teams meetings, voice calls, chats and channels. All the security policies that are implemented with Entra and Defender. All the desktop and mobile management that is done through Intune. And the list just goes on and on.
Microsoft bundles so many things with M365, that when you're already paying for an E5 licence for each user, it makes financial sense to go all-in and use as much as possible.
I may have developed some kind of paranoia reading HN recently, but the AI atmosphere is absolutely nuts to me. Have you ever thought that you would see a chart showing how population of horses was decimated by the mass introduction of efficient engines accompanied by an implication that there is a parallel to human population? And the article is not written in any kind of cautionary humanitarian approach, but rather from perspective of some kind of economic determinism? Have you ever thought that you would be compared to a gasoline engine and everyone would discuss this juxtaposition from purely economic perspective? And barely anyone shares a thought like "technology should be warranted by the populace, not the other way around?". And the guy writing this works at Anthropic? The very guy who makes this thing happen, but is only able to conclude this with "I very much hope we'll get the two decades that horses did". What the hell.
While we will never be able to get folks to stop using AI to “help” them shape their replies, it’s super annoying to have folks think that by using AI that they’re doing others a favor. If I wanted to know what an AI thinks I’ll ask it. I’m here because I want to know what other people think.
At this point, I make value judgments when folks use AI for their writing, and will continue to do so.
LLMs all behave as if they are semi-competent (yet eager, ambitious, and career-minded) interns or administrative assistants, working for a powerful CEO-founder. All sycophancy, confidence and positive energy. "You're absolutely right!" "Here's the answer you are looking for!" "Let me do that for you immediately!" "Here is everything I know about what you just mentioned." Never admitting a mistake unless you directly point it out, and then all sorry-this and apologize-that and "here's the actual answer!" It's exactly the kind of personality you always see bubbling up into the orbit of a rich and powerful tech CEO.
No surprise that these products are all dreamt up by powerful tech CEOs who are used to all of their human interactions being with servile people-pleasers. I bet each and every one of them are subtly or overtly shaped by feedback from executives about how they should respond to conversation.
I always thought menus had icons so they could be matched to the same functionality on the toolbar. If a menu lacks an icon, then it's probably not on the toolbar. This falls apart when there is no toolbar. But I have definitely found an action in the menu, looked at the icon, and matched it to a a button elsewhere.
The other day I've clicked on one of Outlook calendar's copilot prefilled questions: "who are the main attendees of this meeting". It started a long winding speech that went nowhere, so I typed in "but WHO are the attendees" and finally it admitted "I don't know, I can't see that".
> Engineers need to really lean in to the change in my opinion.
I tried leaning in. I really tried. I'm not a web developer or game developer (more robotics, embedded systems). I tried vibe coding web apps and games. They were pretty boring. I got frustrated that I couldn't change little things. I remember getting frustrated that my game character kept getting stuck on imaginary walls and kept asking Cursor to fix it and it just made more and more of a mess. I remember making a simple front-end + backend with a database app to analyze thousands of pull request comments and it got massively slow and I didn't know why. Cursor wasn't very helpful in fixing it. I felt dumber after the whole process.
The next time I made a web app I just taught myself Flask and some basic JS and I found myself moving way more quickly. Not in the initial development, but later on when I had to tweak things.
The AI helped me a ton with looking things up: documentation, error messages, etc. It's essentially a supercharged Google search and Stack Overflow replacement, but I did not find it useful letting it take the wheel.
This seems confusingly phrased. When they say things like "500 Vision Transformers", what they mean is 500 finetunes of the same base model, downloaded from the huggingface accounts of anonymous randos. These spaces are only "universal" to a single pretrained base model AFAICT. Is it really that surprising that finetunes would be extremely similar to each other? Especially LoRAs?
Pictograms in the interface are not decoration. Their purpose is to convey information in limited space. (The information should be that could be conveyed this way.) Currently they are often used as decorations or these two uses are mixed up. This is a mistake.
(It is interesting and saddening to see how years of UI research just went down the drain after Apple "resurrection". In my impression Apple was the first that started to lose their carefully collected UI expertise and replace it something that was original for the time, but that was all. E.g. I remember the very first ads after Jobs' comeback. They still had the beige Macintoshes, but their ads changed. Instead of a typical computer ad that showed a computer with a turned on screen and some desktop picture Apple's ads pictured turned off computers photographed from unusual angles or in unusual positions, like keyboard standing on its side leaning on the box, mouse hanging on its wire and so on. It was different, indeed, it stood out. Thing is, to always strive for that is harmful. Especially for user interface, where the motto is: do not make it original, make it right.)
I work in this exact space (online grocery retailer in Europe). We're profitable and one of the few companies to be so in the sector - many online divisions are losing money and being bankrolled by the parent company with physical stores. Alternatively, burning VC money.
The thing that's wrong with Ocado's technology is that it's ridiculously expensive and tailored for huge FC's (fulfillment centers). The problem with that is that it needs to serve a large population base to be effective and that's hard - in dense metros, the driving times are much longer despite smaller distances. In sparse metros, the distances are just too long. In our experience, the optimal FC size is 5-10K orders/day, maybe up to 20K/day in certain cases, but the core technology should certainly scale down profitably to 3-5K. Ocado solves for scaling up, what needs to be solved is actually scaling down.
There are a lot of logistical challenges outside the FC, especially last mile and you need to see the system as a whole, not just optimize one part to the detriment of all others.
>But in reality, Samsung (and the other Android OEMs) cannot compete with Google and its unique control over hardware and software.
Yes, they can. We are talking about applying provided security patches to source code, and then releasing a new version of their OS. For patches that have existed for months. The time from patch to release should be on the order l
of days from receiving the patches to having a validated OS release with the fix being sent to users. It's not the control of Android which makes Google possible to patch their Pixel branch of AOSP faster than Samsung can patch their own. It's that Samsung doesn't care about prompt security fixes so they don't allocate engineers to do the work.
I would think that if they actually spent the time and money fixing the core functionality of their core products (like Windows and Office) that they might have a much easier time promoting things like Copilot. Instead they leave their users wondering why they're so hell-bent on shoehorning AI into a Start menu that takes whole seconds longer to open than it should or into Windows Search that regularly fails to find installed programs or local files.
To reiterate a crucial point in this comment, replacing the Office apps is the least of the issues. Enterpise customers rely on 365 for identity management, endpoint protection, business intelligence and a whole bunch of other stuff that the average user pays no attention to. We aren't talking about replacing an office suite, but an entire model of IT infrastructure management.
Every time someone builds one of these things and skips over "overcomplicated theory", aphyr destroys them. At this point, I wonder if we could train an AI to look over a project's documentation, and predict whether it's likely to lose commmitted writes just based on the marketing / technical claims. We probably can.
This post is written with its intellectual fly open. I'm not sure whether it was partly AI-generated, or whether the author has spent so much time ingesting AI-generated content that the tells have rubbed off, but this article has:
- Strange paragraph-lists with bolded first words. e.g. "The Cash Flow Mystery"
- The 'It's not just X; it's Y' meme: "Buying Groq wouldn't just [...], it could give them a chip that is actually [...]. It’s a supply chain hedge."
Tells like:
- "My personal read? NVIDIA is [...]"
- "[...]. Now I'm looking at Groq, [...]"
However, even if these parts were AI generated, it's simultaneously riddled with typos and weird phrases:
- "it looks like they are squeezing each other [sic] balls."
- Stylization of OpenAI as 'Openai'.
Not sure what to make of this low-quality prose.
Even if the conclusion is broadly correct, that doesn't mean the reasoning used to get there is consistent.
I do, at least, appreciate that the author was honest up-front with respect to use of Gemini and other AI tools.
Was in a hotel in Sapporo, almost got thrown out of bed. Lot of people in the hotel lobby now.
Considering leaving Hokkaido by air if a Hokkaido and Sanriku Subsequent Earthquake Advisory is issued, don't really want to be in a potential megaquake.
I wrote a comment saying that this should be possible with a proper playwright harness and screenshot taking. My comment ended up in the negatives (though curiously no one stopped to explain why), as if I was saying something so absurdly inaccurate that it wasn’t even worth rebutting. Thank you for actually running the experiment and proving it - I was almost annoyed enough to do it myself.
I couldn’t understand why it had happened - it felt about as logical to my mind as writing a comment that Rust was faster than Node. I feel there is a strong anti-AI sentiment here, to the point that people will ignore evidence presented directly to them.
Personal vendetta aside, I enjoyed this post! You had some clever tricks I wouldn’t have considered. In fact, the idea of producing a pixel diff as output was particularly imaginative. And the bit about autoformalization definitely hits on something I’ve been feeling when working with AI recently.
EDIT: I notice my comment yesterday is in the positives. Please don’t vote it up. That was not my intention here.
I live in Misawa (https://en.wikipedia.org/wiki/Misawa,_Aomori) and work in Rokkasho (https://en.wikipedia.org/wiki/Rokkasho), which is the area where the earthquake hit the strongest. It was quite violent, apparently the strongest earthquake ever recorded in the region. My house suffered no damage other than a few things falling off the cabinets, and I could sleep soundly afterwards, but lets see today at work.
> I'm sure every organisation has hundreds if not thousands of Excel sheets tracking important business processes that would be far better off as a SaaS app.
Far better off for who? People constantly dismiss spreadsheets, but in many cases, they are more powerful, more easily used by the people who have the domain knowledge required to properly implement calculations or workflow, and are more or less universally accessible.
I have been completely shocked by the number of people in the tech industry who seem to genuinely place no value on humanity and so many of its outputs. I see it in the writing of leaders within VC firms and AI companies but I also see it in ordinary conversations on the caltrain or in coffee shops.
Friendship, love, sex, art, even faith and childrearing are opportunities for substitution with AI. Ask an AI to create a joke for you at a party. Ask an AI to write a heartfelt letter to somebody you respect. Have an AI make a digital likeness of your grandmother so you can spend time with her forever. Have an AI tell you what you should say to your child when they are sad.
From an accessibility/localization stand point, icons+text everywhere seems to be ideal.
Also, I disagree with:
> This posture lends itself to a practice where designers have an attitude of “I need an icon to fill up this space”
Sure, that does technically happen, but is in no way preventative or mutually exclusive with the follow on thought:
> Does ... the cognitive load of parsing and understanding it, help or hurt how someone would use this menu system?
That still happens, because if they mismatch an icon with text, that can result in far worse cognitive load/misunderstanding than if no icon was present at all. This becomes readily apparent in his follow on thought experiment where you show someone a menu with icons+text, but "censor" the text. Icons+text is also superior to [occasionally icons]+text in the same thought experiment. From my perspective, the author just argued against their own preference there.
I'd argue that the thought process behind determining an appropriate icon is even more important and relevant when being consistent and enforcing icon+text everywhere, not diminished. It also has the broadest possible appeal (to the visual/graphically focused, to the literary focused, to those who either may not speak the language, and/or to those who are viewing the menu with a condensed/restrictive viewport that doesn't have room for the full text). Now, if the argument is predicated on "We aren't willing to pay a designer for this" then yeah, they have a point. Except they used Apple as an example so, doubt that was the premise.
Most of software work is maintaining "legacy" code, that is older systems that have been around for a long time and get a lot of use. I find Claude Code in particular is great at grokking old code bases and making changes to it. I work on one of those old code bases and my productivity increased 10x mostly due to Claude Code's ability to research large code bases, make sense of it, answer questions and making careful surgical changes to it. It also helps with testing and debugging which is huge productivity boost. It's not about its ability to churn out lots of code quickly: it's an extra set of eyes/brain that works much faster that human developer.
If you can reduce a problem to a point where it can be solved by simple code you can get the rest of the solution very quickly.
Reducing a problem to a point where it can be solved with simple code takes a lot of skill and experience and is generally still quite a time-consuming process.