This is the real nugget of wisdom here. This should be confirmation to everyone that no one understands the LLM internals and they are not aligned. When they are eventually given control to run things, they will behave in wildly unexpected ways, and past the point of being able to change them.
This is a worry that people have been talking about in various forms for a while now, and I think it's a gigantic one. The only reason this was caught is that the quirk was a very noticeable verbal one. When words like "goblin" and "gremlin" pop up it is easy for us to spot. If the quirk takes another shape (say, ranking certain people with certain features as less trustworthy) it might be too subtle or too weird for us to notice it. Would I ever notice if ChatGPT consistently rates people born in June to be untrustworthy?
The main difference I read is that those airfoils actually come into play when it's not taking off and landing. That still doesn't make it nearly as cool as the air cars in Blade Runner but it's slightly better than just a helicopter too.
I want to create a "harness" that does this with Claude Code and other expensive agents.
Buffer user prompts, use conversation history and repo state as context -- and run a local model or a cheap and fast cloud model like Haiku to determine the optimal way to address the user's ask, reframe the query with better context (user reviews and approves if needed) and THEN let expensive models like Opus have a go at it.
If we are operating within Anthropic ecosystem with Haiku and Opus -- this sort of logic should ideally be doable within Claude Code as harness. Currently skills cannot be tagged to different models. Ideally we should be able to say -- for trivial tasks, the skill should always use Haiku even if invoked from a session with Opus xhigh.
> Currently skills cannot be tagged to different models. Ideally we should be able to say -- for trivial tasks, the skill should always use Haiku even if invoked from a session with Opus xhigh.
You can set the model for a skill. You just set model: haiku at the top and it will use haiku! You can even set the effort level, look for “Frontmatter reference” in this doc article: https://code.claude.com/docs/en/skills
Same works for subagents — .claude/agents/triager.md with model: haiku plus a Task call from the main loop. The reason to roll your own was the sandbox, not the routing.
We considered wrapping Claude Code when we started building Mendral (this agent in the article). We ended up building our own agent, it's lot more work because we followed all the right patterns as the models evolved (sub-agents, proper token caching, redo basic tools like read,write,edit,bash, etc...). But it paid off over time when you build an agent that is focused on a specific task (not a general coding agent).
The main driver for writing our own agent was to leave it out of the sandbox (the agent loop runs on our backend, we call the sandbox only when needed). We wrote another post about that (it's the latest post on the blog).
However, I am curious how would you implement the triager pattern by only using Claude Code as harness.
is $10 Pro monthly subscription a pre-requisite before i can purchase $10 in API credits?
PS: i would have loved if I can directly buy $10 in credits and be free to spend it as quickly or as leisurly as I want -- without any monthly expiry or fixed recurring payments
My time to shine! I've spent yesterday morning to track the photo down and answer this question.
The APOD description is lacking.
Yes, this was an exaggerated stack of 153 four-second exposures (the rejection map of the satellite trails was added on top of the image), and the gaps happened when the camera took its time to save between two exposures.
Probably exactly that. If you take a single 10 minute exposure (or really, anything more than a few seconds) you'll get noticeable star trails if you don't put your camera on a rotating mount. Stacking multiple exposures also has other nice benefits such as noise canceling itself out and being able to remove satellite trails.
Last time I did astrophotography was a few years ago, before Starlink made the problem considerably worse, but satellite trails were relatively easy to remove with stacking. I'm sure it's harder now but definitely still possible, so I'm assuming in this case leaving them in was done on purpose to highlight the problem.
EDIT: Looking better at the picture, I belive this was taken with a star tracker and then composited with a shorter exposure of the foreground. Notice how the foreground, even far away, looks considerably blurrier than the stars, and how the tower in the background has some light streaks. This is exactly what you'll see if you use a star tracker. Rather than star trails, you'll have "foreground trails". This would explain why there are relatively few gaps in the satellite trails, since the exposures can be much longer.
Update: I was wrong, check max-m's sibling comment! The satellites just move really fast across the camera because they're in LEO, so they can traverse rather large distances before there's a new exposure and a small gap.
I am guessing, but I think it likely has to do with the shape and orientation of the satellite with respect to the sun and the camera. Depending on the relative positions, the brightness reflected off the satellite and reaching the camera will change over time.
I've taken long exposures using film (analog, so no stacking or any other funny business) and saw the same thing. I always thought they were planes but now it seems they may have been satellites. I'm curious if someone knows why this happens
Pretty much every DSLR/DSLM camera out there has a "bulb" mode that keeps the shutter open as long as you hold down the shutter button. I think my personal record is a 20-minute exposure.
As for actually holding down the button, you can either use an external wired shutter button that has a mechanical lock to hold it down, or you use a wired controller that has an electronic timer, or you use a software feature in the camera to set the bulb timer.
For anybody wondering, the reason not to do a single ultra-long exposures is noise.
There's an equilibrium between exposure duration, aperture, and ISO that gives the best results for the conditions with a minimum amount of sensor noise, and getting close to the equilibrium and stacking the images typically gives better results than one massive exposure.
I believe your claim about noise and long exposures is false. To start, I posit that there are three sources of noise:
0) Photon shot noise from the object that you want to photograph. This is an inherent and unchangeable quantum-mechanical fact.
1) Sensor read noise per photo taken. This increases with the number of subexposures.
2) Dark current noise per time and per temperature.
#0 and #2 only depend on the total exposure time, not the number of subexposures. #1 actually gets worse with more subexposures, but what you gain are the ability to reject satellite trails, bad mount tracking, cosmic rays, wind gusts, rolling clouds, and other transient artifacts. Whereas if you took a single hour-long exposure, it's essentially guaranteed to be ruined by something.
As for ISO, it is very commonly misunderstood. ISO amplifies photon noise and dark current noise, and changing the ISO doesn't make your images better or worse in these aspects. ISO in the form of analog gain can help boost the signal above the analog-to-digital converter noise, and that's what it's useful for. The MinutePhysics video explains excellently: https://www.youtube.com/watch?v=ZWSvHBG7X0w . More and more sensors these days approach "ISO invariance", where analog amplifier gain has about the same effect as digital gain (i.e. multiplying the measured numbers on a computer).
Exactly what I'm refuting:
> exposure duration
In astronomy, more is better. Get as much total exposure time as you can afford (e.g. time being at a suitable location, time spent monitoring the equipment, time under clear skies).
> aperture
In astronomy, more is better. Buy the biggest aperture you can afford - obviously, subject to constraints such as cost, weight, mountability, focal length. Also, telescopes don't have adjustable aperture blades, unlike general photographic lenses. You could put a disc cut-out in front of the telescope to close down the aperture, but that's just a waste of light.
> minimum amount of sensor noise
You get the least amount of sensor noise by reducing the exposure time and reducing the temperature (dedicated astro cameras have Peltier cooling). Note that although noise increases with time, signal increases with time faster, so the signal-to-noise ratio is proportional to the square root of time. So 100× more exposure time gives you a 10× better SNR.
> stacking the images typically gives better results than one massive exposure
This is the main falsehood that I wanted to address. Taking multiple images actually gives more noise overall, even if it's a tiny bit. But multiple images gives you much more processing flexibility and the ability to selectively reject things.
My Canon can do this without modification and its 8 years old. Switch to bulp and have an external mini device which you connect with a microphone cable and it creates the signal for shutter off after x minutes.
For extra long exposre its recommended to use also a stable powersource.
How is a 10 minute continuous exposure functionally different from 10 minutes of video with every frame stacked? In the former, each photodiode acts as a compositor for each pixel instead of whatever algorithm is chosen to combine frames in the latter?
You pay the read noise every time you read out the sensor and digitize the values. Also, you lose a tiny bit of time between exposures as the sensor resets itself. And you might have a bottleneck in moving the data off the sensor and saving the image. Furthermore, if you perform lossy compression on the video, then your digitally stacked image will differ significantly from analog stacking on the silicon sensor.
> The challenge is: when you let a session idle for >1 hour, when you come back to it and send a prompt, it will be a full cache miss, all N messages. We noticed that this corner case led to outsized token costs for users.
I dont agree with this being characterized as a "corner case".
Isn't this how most long running work will happen across all serious users?
I am not at my desk babysitting a single CC chat session all day. I have other things to attend to -- and that was the whole point of agentic engineering.
Dont CC users take lunch breaks?
How are all these utterly common scenarios being named as corner cases -- as something that is wildly out of the norm, and UX can be sacrificed for those cases?
This largely appears to be a HTML generator at its core, not necessarily what Figma does with layers/canvases etc. There's no collaborative nature to it either.
It feels like a lightly designed product that moves claude CLI to their backend, generates the HTMLs and renders them in browser on claude.ai website for you. Sure, it accepts your design system as an input from you or imports from your repo, but you could feed the same into claude CLI as well?
I'm curious what exactly it gives besides having claude CLI + prompting it well with your design system + skills.
The IBM/Microsoft analogy is a classic. It’s always fascinating to watch these 'frenemy' dynamics play out. In these cases, the one who owns the direct interface with the end-user usually wins the long game, while the 'infrastructure' partner risks becoming just another utility. Will be interesting to see if Canva can maintain its identity or just become a shell for Claude's output.
Yep agree it looks like it’s taking the existing generated artefact, parameterising it within an inch of its life, exposing a pseudo WYSIWYG for the parameters and calling it a day with a few export options. Not a huge leap from what they’ve got already but it’s a clever adjacent step for sure. Same product new chrome.
What dangers lurk beneath the surface.
This is not funny.
reply