Hacker Newsnew | past | comments | ask | show | jobs | submit | hazrmard's commentslogin

    https://iahmed.me
Hugo website, with a theme I made from scratch myself.

Github Pages deployment.

Here's my first website from when I was in college and had no experience in web dev. I still keep it on for nostalgia:

    https://iahmed.me/old_www/

This reflects my experience. Yet, I feel that getting reliability out of LLM calls with a while-loop harness is elusive.

For example

- how can I reliably have a decision block to end the loop (or keep it running)?

- how can I reliably call tools with the right schema?

- how can I reliably summarize context / excise noise from the conversation?

Perhaps, as the models get better, they'll approach some threshold where my worries just go away. However, I can't quantify that threshold myself and that leaves a cloud of uncertainty hanging over any agentic loops I build.

Perhaps I should accept that it's a feature and not a bug? :)


Forgot to address the easiest part:

> - how can I reliably call tools with the right schema?

This is typically done by enabling strict mode for tool calling which is a hermetic solution. Makes llm unable to generate tokens that would violate the schema. (I.e. LLM samples tokens only from the subset of tokens that lead to valid schema generation.)


Re (1) use a TODOs system like Claude code.

Re (2) also fairly easy! It's just a summarization prompt. E.g. this is the one we use in our agent: https://github.com/HolmesGPT/holmesgpt/blob/62c3898e4efae69b...

Or just use the Claude Code SDK that does this all for you! (You can also use various provider-specific features for 2 like automatic compaction on OpenAI responses endpoint.)


The paper finds:

- For LLM-assisted output, the more complex the LLM-writing is, the less likely the paper is to be published. From eyeballing, at WC=-30, both have similar chances of publication (~46%). At the upper range of WC=25, LLM-assisted papers are ~17% less likely to be published.

- LLM-assisted authors produced more preprints (+36%).

I wonder:

- What is the distribution of writing complexity?

  * Does the 17% publication deficit at WC=25 correspond to 17% of the 36% excess LLM-assisted papers being WC=25, thus nullifying the effect? Although, it puts extra strain on the review process.

I can vouch for this with my experience.

Back in grad school, I was out making new friends. I was playing tennis 4-5 times a week. I'd invite players for post-game coffees (in the morning) and dinner (evenings) at every game. Consistency mattered. I'd ask every time. Slowly we had our regulars. Our coffee times became an institution in and of themselves.

People are busy, yes. But, people also want to be in demand. People also don't want to be rejected. And, people also don't want to be left out.

Asking around, I was exposing myself to rejection. Some folks appreciated their time being demanded. More still joined because they didn't want to be left out.


Tennis is a great hub for connection when you're retired. I play a lot, and people are always meeting up after we play and forging all sorts of non-tennis relationships. Sadly for me, this is all during the day when I'm rushing back to work hoping my 90m absence didn't coincide with some emergency.


Do I understand this right?

A light-weight speculative model adapts to usage, keeping the acceptance rate for the static heavy-weight model within acceptable bounds.

Do they adapt with LoRAs?


This takes me down a memory lane! For my undergrad capstone project, we made a cubesat tracker for our university's satellite using a RPi/Arduino/Software-defined-radio to receive transmissions every time it passed over us. I cringe a little looking at the code now - but it worked!

I agree, cubsats are a wonderful way, for college students even, to tinker with space(-adjacent) tech.

https://github.com/hazrmard/SatTrack


I am working on a budgeting app!

Features:

  - Local. No internet connection needed.  
  - Manual. Every transaction is added by the user.
  - One-off or arbitrarily recurring transactions.  
  - No lock-in. Check out your data any time.  
  - Arbitrary metrics to track performance. 
  - Hosting on the cloud for mobile access. 
Why?

I've been using Google sheets + forms for the last 8 years to track my finances. It's worked well, except for minor inconveniences. This app is my answer to my own problems.


Thank you very much! I had been reading up on effect of diet for treatment & management. I like the focus on citations.


Thank you. This is great. I also appreciated the linked code for MinRL (https://github.com/10-OASIS-01/minrl).

Having done research in RL, a big problem with incremental research was to reproduce comparative works, and to validate my own contributions. A simple library like this, with built-in tools for visualization and a gridworld sandbox where I can validate just by observation, is very helpful!


It's great to see proliferation of models in other languages!

Shoutout to Alif, a finetune of Llama 3 8b on Urdu datasets: https://huggingface.co/large-traversaal/Alif-1.0-8B-Instruct

It'd be great to see a comparison.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: