More

youngprogrammer · 2026-05-28T08:17:21 1779956241

You might as well play "who can memorize the most openings and lines"

jgalt212 · 2026-05-28T11:51:03 1779969063

Seems dismissive at first, but I interviewed a chess team captain once and he told me they prepared for upcoming matches by learning new lines they wanted to play and studying the lines the opposing school was deemed likely to play.

ccppurcell · 2026-05-28T16:37:06 1779986226

It is a memory game yes! But still you can choose what to memorise.

youngprogrammer · 2026-05-27T01:11:01 1779844261

It can get complicated quickly if you're actually using it in a production system. At my prev enterprise saas company we had feature flags that could be turned on per customer / per environment (dev, staging, prod) with permission + logging model such that our support team could also toggle flags with history of who turned on what. We also had "per user" feature flags for certain test users at companies and had DSL rules to evaluate the features

youngprogrammer · 2026-05-06T23:54:22 1778111662

this is essentially how bindcraft works for drug discovery: https://www.nature.com/articles/s41586-025-09429-6 (minus the accumulation step)

feedback from af2 folding confidence + structural scoring

youngprogrammer · 2026-04-20T04:07:49 1776658069

Fish sauce is delicious but had to stop using it since it's high in histamine (gives me a stuffy nose) and potentially carcinogenic due to its high levels of nitrosamines

youngprogrammer · 2026-03-19T23:58:02 1773964682

industrialized overfitting is basically what ML researchers do

youngprogrammer · 2026-02-24T02:03:12 1771898592

should go a bit earlier with word2vec, NMT, seq2seq, attention, self attention

youngprogrammer · on Dec 16, 2023

Little late to this thread but from my list:

LLM (foundational papers)

* Attention is all you need - transformers + self attention

* BERT - first masked LM using transformers + self attention

* GPT3 - big LLM decoder (Basis of gpt4 and most LLM)

* Instruct GPT or TKInstruct (instruction tuning enables improved zero shot learning)

* Chain of Thought (improve performance via prompting)

some other papers which are become trendy depending on your interest

* RLHF - RL using human feedback

* Lora - make models smaller

* MoE - kind of ensembling

* self instruct - self label data

* constitutional ai - self alignment

* tree of thought - like CoT but a tree

* FastAttention,Longformer - optimized attention mechanisms

* React - agents

youngprogrammer · on Dec 30, 2021

It seems like most of these imperceptible changes could be addressed by something like ascii folding (https://www.elastic.co/guide/en/elasticsearch/reference/curr...) but this might not apply for non-english use cases.

If you're interested in adversarial NLP, I also recommend reading this blog post on adversarial attacks on GPT2 with universal triggers (e.g. adding "nobody" as prefix for all inputs causes all entailments to be predicted as contradiction).

youngprogrammer · on Aug 25, 2020

They probably didn't have a Gantt chart to help them figure out the dependencies to properly plan it on their roadmap

youngprogrammer · on April 6, 2020

You could do something similar to how they trained a ML model to find antibiotics compounds: https://www.cell.com/action/showPdf?pii=S0092-8674%2820%2930.... First, train a deep learning model to learn a representation of molecules from their molecule structures. Then feed in the thousand or so known compounds that produce pleasant or unpleasant smells as training data with some score of "pleasantness". We can then use this model to quickly score millions of compounds and select candidates to test.

r0b05 · on April 6, 2020

I love the explanation.