Hacker Newsnew | past | comments | ask | show | jobs | submit | rotskoff's commentslogin

My research group at Stanford has been alpha testing Tinker, it's both very useful and also really technically impressive in my opinion. It's a unified framework for post-training models and it abstracts almost all of the complexity of managing these jobs across resources. That it manages to do this while also allowing a lot of algorithmic flexibility is pretty unique.


Silly question: how is it different from, say, hf's transformers and similar libraries and APIs?


with hf transformers, you still need to manage GPUs


In fact, the shoe was not banned and it is one of the most widely used marathon racing shoes out there. Nearly all other running shoe companies followed suit and made more efficient racing flats with carbon plates and high energy return foams. Some restrictions did come out, including limits on the "stack height" racing shoes and more stringent restrictions on the track. The current generation of these, the alphafly 3, were just used to break a world record in the Chicago Marathon by Kelvin Kiptum. Those are unreleased as of yet.



Statistical physicist here. Negative temperatures occur when a system has a finite number of high energy states. On average, when temperature increases, both the energy and the "randomness" or entropy of a configuration increase as well. Of course, if there are only a few high energy states available, then the randomness will not increase, it will decrease. That's negative temperature! Because we define temperature as the rate of change of the energy with respect to the energy, in systems like the one I described, this rate of change becomes negative.


I find your explanation much better than the one at https://simple.wikipedia.org/wiki/Negative_temperature, please consider expanding that page a bit!

Back to the substance of the topic: I feel let down here, negative temperature sounds like something amazing but it turns out to be more of a quirk due to the definition. I wonder if physicists would have chosen this definition, if had they been aware of this when they did.

Also I wonder if there's an intensive thermodynamic property that actually says how much thermal energy is in the system, since temperature apparently won't do it?


> Also I wonder if there's an intensive thermodynamic property that actually says how much thermal energy is in the system, since temperature apparently won't do it?

Thermodynamic beta[1] does exactly that. If we consider temperature as "tendency to give energy away", then the scale starts at zero, heads out through positive infinity, comes in through negative infinity, and then stops at negative zero. I.e. if T_a > T_b > 0, then system a gives energy to system b. Then, if T_c < T_d < 0, then system d gives energy to system c.

Thermodynamic beta (really just 1/T, the "coldness" of a system) fixes this: if B_a < B_b anywhere on the number line, then B_b is colder than B_a.

[1]: https://en.wikipedia.org/wiki/Thermodynamic_beta


Do you think the big bang could have been an "entropy population inversion" event? All high entropy states were occupied, so whatever event started the big bang caused the universe to dip into negative temperature and entropy and allowed our universe to form?

Or.. am I just a crank?


So the basic idea is that I’d you have a two particle system (a and b) and each particle can be in a low (a and b) or high energy (A and B) state then you can have the following combinations: ab, Ab, aB, and AB. Since particles are indistinguishable Ab and Ba are the high entropy states but ab and AB are low entropy states?


Basically. Instead of a and b, let's just look at L(ow) and H(igh) energy states. If I have the ground state,

    L L L L
there's only one way to arrange the system: everything in the low state, and the entropy is log(1) = 0. If I add one quantum of energy, I can have

    L L L H, L L H L, L H L L, H L L L
and the change in energy is dE = 1, while the change in entropy is dS = log(4) - log(1). "Temperature" is really just a scaling factor between dE and dS, so in this case T > 0. Adding another quantum,

    L L H H, L H L H, H L L H, L H H L, H L H L, H H L L
Again, dE = 1 (by construction), and dS = log(6) - log(4) which is less than log(4). dS > 0 so T is also greater than 0. Adding one more quantum, however,

    L H H H, H L H H, H H L H, H H H L
and we have dE = 1 and now dS = log(4) - log(6)! We've added energy (conventionally made it "hotter"), but the system has become more ordered. Adding one last quantum,

    H H H H
dE = 1 and dS = log(1) - log(4). This is as hot as the system can get, it cannot accept any more energy and can only give it away which is why "negative temperatures are hotter than all positive temperatures". If this system is brought into contact with any conventional positive temperature system, statistical fluctuations mean at least one of the energy quanta we added will flow towards it, cooling this system and heating the other.


Thank you, this made a lot more sense!


I wish I had more than one upvote.


Rate of change of entropy i presume is what you meant


What’s an example of such a system?


Lasers do this. The essential idea is that an incoming photon (of some specified energy E) knocks a bit of energy (specifically also E) out of an excited particle which leaves as another photon. To make this happen, you need more particles in an excited state than in the ground state[1] which is exactly the same condition necessary for negative temperature.

[1]: https://en.wikipedia.org/wiki/Population_inversion


Take an atom that somehow only has two energy levels, 0 and 1. Connect the atom to a heat sink at temperature T. At very low T that atom almost always has energy 0. We know the energy of each atom very well, they're (almost) all 0. People say: there's very little entropy. At very high T the atom has expected energy nearly 0.5, and the probabilities of energy 0 or energy 1 are nearly equal. So that's maximum entropy. We're as ignorant as it's possible to be.

At negative temperature the expected energy of each atom is >0.5. But as the expected energy approaches 1.0, we know the energy of each atom very well. They're (almost) all 1. That's weird. It's weird enough that you can't assign a positive temperature to these atoms.

Physically, you can get to negative temperature by sneaking atoms into the 1 state. Pumping a laser is an example. But you can't get to negative temperature by just heating with finite-temperature heaters.

Entropy can be measured in bits. If we have 10 two-level atoms at extremely high temperature, that's 10 bits of entropy. The state might be 10 1100 1110 or 01 1101 0111 or any of 2^10 possibilities. On the other hand if we have 10 two-level atoms at extremely low positive temperature, the state is usually 00 0000 0000 and the entropy is close to 0 bits.

"Entropy" in bits is just the size of the random number you need in order to represent the system.


Convolution is in fact multiplication in Fourier space (this is the convolution theorem [1]) which says that Fourier transforms convert convolutions to products.

1. https://en.wikipedia.org/wiki/Convolution_theorem


Came here to make the exact same comment, agree completely!


I follow court news pretty closely and was genuinely curious about this decision. Here is a brief description of what I understand the decision to mean (I am not an expert by any means). Under the federal "Major Crimes Act" (MCA) members of various Native American tribes are not subject to state prosecution for crimes committed in "Indian territory". This case was brought by McGirt, a member of the Creek Nation, who challenged his conviction based on the original treaty establishing the Creek reservation in eastern Oklahoma. The headline is a bit misleading: for purposes of the MCA, eastern OK is now considered a part of the Creek Nation. This means McGirt must be convicted with the reservation's justice system or in federal court. It also has the consequence (discussed extensively in Roberts' dissent) that many existing convictions could potentially be vacated. It will be interesting to see how the Creek nation works with Oklahoma to address these changes.


They guy has already served 20 years for a crime which he originally plead guilty for. He was convited for raping a four year old girl.

Why he is challenging it now and by contesting the court's authority rather than his own guilt? Does he expect to be found not guilty under the Cree Nation?


I follow SCOTUS news pretty closely; the discussions below are a bit misguided. Audio transcripts of oral arguments are already widely available---you can even subscribe to the Oyez podcast feed and find them in your podcast queue a few days after the court hears the case. The new thing here is "live", so as a practical matter, it probably doesn't constitute a huge change. If there were an incentive for the justices to produce "sound bites", it would already exist. C-SPAN coverage will probably increase the visibility a little, but, having listened to many of the arguments this term, I'd say that most cases are too technical to be of much general interest.

A second point: oral arguments are performative. The cases are argued via written briefs and oral arguments provide a venue for the justices to question the petitioners about their arguments and air their responses to what the believe the other justices are thinking. Streaming the arguments, as opposed to making available courtroom audio after the fact, doesn't seem to change the dynamic much.

Many court watchers have taken this as an optimistic sign that perhaps the court will allow video. However, this is one thing that the court has strongly resisted. Some of the justices are known to prize their relatively low public profile and there's been speculation that maintaining that pseudo-anonymity is perhaps a reason for the hesitance.


> as a practical matter, it probably doesn't constitute a huge change

This is a reasonable point on which to disagree. (I disagree with it.)

The difference between a live performance and recorded one are huge. Playing soundbites on cable TV is, I believe, much more likely with live arguments than with recorded ones.

I hope you're correct.


I agree with you. I follow SCOTUS cases within a couple of narrow interests, and there is always much discussion about what was really meant by questions that were asked. Much of that is due to lack of tone.

Hearing oral arguments is going to be interesting. I don't know that it's ever been available to the general public.


> Hearing oral arguments is going to be interesting. I don't know that it's ever been available to the general public.

You're in luck. The audio has been recorded since 1955 and can be viewed along with transcript on Oyez [1]! Oral arguments are complex and fascinating. SCOTUS is my favorite higher-branch of government; the justices are humble, intelligent, and for the most part bipartisan. They are institutionalists above all else. More often than not SCOTUS makes me proud.

[1] https://www.oyez.org/cases/2019


A live stream may not have been available before but I distinctly remember listening to oral arguments back when Heller was being argued so at the very least after the fact recordings have been available to the general public. I think more people would benefit to listen to an oral argument once or twice, the experience of hearing them dig into one sides argument leaves you feeling like you know where the case is going to go right up until they start on the other side. It gives you a new appreciation for just how hard I think the SCOTUS does try to be apolitical even in highly political cases.


>If there were an incentive for the justices to produce "sound bites", it would already exist.

I think it already does exist, as there have been plenty of quote mining of judges at all levels that gets fed into news reports, many designed to stir outrage. To the extent that quote mining it a threat to the fair and just application of law should be an issue I think the legal system (among other groups impacted) should be more targeted at resolving.


I read and listen to Oyez a lot. From my persona experience, laymen might not get the things they are talking when they cite previous cases, some legal words might not easy to catch, and the judges most likely is questioning the briefs which won’t be available. Live audio might open interests to public when working from home, but nothing beats oyez’s synced transcript with audio. It will be interesting to listen to whether judges will change their way to question cases. Or more interestingly, when Thomas will say something


He asked questions on the first audio call, linked above.


I support resisting video, especially if the justices are shown while arguments are being presented. They should focus on listening, not thinking about how their facial expressions will be dissected on the Internet tomorrow.


As many have pointed out, this rediscovers Newton's method. The reason that this type of approach with a "hessian pre-conditioning" is not widely used in practice is that computing the hessian is costly. Avoiding that additional computation is the idea underlying "quasi-Newton" methods like BFGS and (more loosely) popular methods like Adagrad.


In the chemistry / molecular biology community, that is definitely part of the role played by a journal like Cell. It is considered a very prestigious journal and tends to publish only what the editors and reviewers perceive to be very important findings. Of course, we (scientists) are imperfect reviewers and sometimes don't see the value in great work and likewise sometimes imagine greatness in mediocre work. Editors also feel pressure to select work that they believe will be rapidly and frequently cited---as the number of citations per paper in the first two years after publication affects the impact factor of the journal.

Unlike math and physics, many disciplines have only recently adopted preprint servers (like bioRxiv) and there's still some bias against posting preprints within the community.


This is a fairly bizarre way to rank academic institutions. First of all, the methodology almost entirely neglects computer science because its metric of success is papers published in top tier journals; computer scientists tend to submit to conferences (e.g., NeurIPS) and hence get no credit for their work in this count. However, other fields seem over-represented from the list of journals---this is likely why Cold Spring Harbor, a very good biological lab where probably the vast majority of the papers are published in Nature-approved venues seems to be so elite.

The "normalization" they use divides the proportional count of authors that have contributed to an article in the "Nature Index" to the total output of the institution in the sciences, measured by a company called Dimensions. This has the odd effect of penalizing institutions for publishing outside their listed journals.

Finally, as an academic, there are some journals on the index that I have published in, but many venues I have published in did not make their cut. Sometimes more specialized journals are necessary---one cannot easily publish, for example, a detailed proof of a theorem in Nature, even if the result is very important.

List of journals: https://www.natureindex.com/faq#introduction1


The lists, and Nature index, seem like a pretty transparent ploy. Want your institution ranked well? Make sure you publish with us.

Departments and institutions already fret over US News rankings. I wonder if we'll see memos encouraging publication in NPG journals if these rankings become regular.


I was comparing their #1 Cold Spring Harbor Lab to HHMI Janelia Research Campus, which doesn't even seem to be ranked in the top 100. Yet when I look at the research output, it seems pretty similar with the nod to Janelia in terms of FC, top article Altmetric score and article count.

https://go.nature.com/2x8zE68 (Janelia Nature Index FC = 36.41)

vs

https://go.nature.com/2RsQl5n (CSHL Nature Index FC = 31.24)

So the denominator for the # of articles per Dimensions must be much higher or N/A and disqualifies some institutions? Unfortunately the Dimensions article count isn't viewable for any institution not making the top 100 list.


Although it's unfortunate to say, but when one makes a big scientific journal, the inclination is to go to one of the top journals (Nature index) before going to journals outside of the listed journals. The reason behind this is that getting a paper in one of those journals is directly related to academic payoff (prizes, promotions, grants). Viewed from this payoff perspective, reason behind this Nature index isn't so unreasonable.

This Nature index would be equivalent to counting the number of papers accepted into the top computer science journals. So yes this index ignores the computer science discipline.


I guess you can thus say the list suffers from selection bias and skews small samples which makes it pretty useless but good marketing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: