I’m always a bit skeptical about these sorts of things. Perhaps I’m just ignorant about the methods used.. but the amount of data we can get from the most distant known galaxy can’t be very much. How confident can we be that the shift in observed light or whatever is actually from the presence of Oxygen and not one of probably countless other causes, both known and unknown.
Pretty confident. Emission spectra are very specific. The relationship between specific emission lines are invariant to changes in wavelength (redshift/blueshift), and they're complicated enough that it's not too big a deal to weed out false positives.
a lot better...we can excite these atoms in a lab and simply measure what comes out. often times don't even need a model because spectroscopy can be fully empirical for many transitions
Spectroscopy is a very straight-forward and mature technique. Saying that “we see these spectral lines in this picture so there must be oxygen” is not considered any more controversial than “we see a bright dot in this picture so there must be a star”.
There are a lot of very smart people working in astronomy, they review each other’s work, and they compete for jobs and funding. Does it really seem likely that none of them have thought to validate basic assumptions?
Having family members who do that for a living, I can tell you that’s a huge chunk of the job. Astronomers all know that they have significant limitations in the data that they can collect and spend a lot of time thinking about ways they can test different theories. It would be a career-making move if some grad student could come up with a new explanation which changes the previous understanding and given the ratio of degrees granted to jobs the incentives really wouldn’t favor covering anything up.
by arguing the need to be sharing code (despite it not being the logical thing to do), you end up with (sub)contractors who will have more work to do to comply, and to make an otherwise simple system more complex by virtue of having to shove two separate systems together in one, so that they could be shared between agencies!
This is 100% a factor. The internet has some pretty dark and nasty corners; therefore so does the model. Seeing it unfiltered would be a PR nightmare for OpenAI.
Can someone help me understand why it's a problem for companies to train these huge LLM on your copyrighted material? What exactly is the harm that is being done to the copyright holder?
I can understand why the New York Times (for example) wants to claim that a couple billion dollar companies have done it actual harm; but I am struggling to actually identify what it is.
>The complaint cites several examples when a chatbot provided users with near-verbatim excerpts from Times articles that would otherwise require a paid subscription to view. It asserts that OpenAI and Microsoft placed particular emphasis on the use of Times journalism in training their A.I. programs because of the perceived reliability and accuracy of the material.
>In one example of how A.I. systems use The Times’s material, the suit showed that Browse With Bing, a Microsoft search feature powered by ChatGPT, reproduced almost verbatim results from Wirecutter, The Times’s product review site. The text results from Bing, however, did not link to the Wirecutter article, and they stripped away the referral links in the text that Wirecutter uses to generate commissions from sales based on its recommendations.
>The lawsuit also highlights the potential damage to The Times’s brand through so-called A.I. “hallucinations,” a phenomenon in which chatbots insert false information that is then wrongly attributed to a source. The complaint cites several cases in which Microsoft’s Bing Chat provided incorrect information that was said to have come from The Times, including results for “the 15 most heart-healthy foods,” 12 of which were not mentioned in an article by the paper.
Somewhere else in this thread, an example of given. An LLM is trained using all of Frank Miller's copywritted material (he makes comics books). A user then comes along to the trained LLM and says make a comic book that looks like Frank Miller's comic books, and the user then sell the newly created comic book for profit. Should Frank Miller not get something?
Though that is different from saying Frank Miller was harmed. I guess if his sales dropped because people were buying GPT stuff instead that would be the case.
Whether or not this will reduce CO2 is yet to be seen. But it will almost certainly raise the price of beef and dairy products. Taxes on producers inevitably get passed on to consumers.
Could you explain which externality this is? I know they produce a lot of methane which is more of a GHG than CO2, but as far as I know, that methane is a part of the carbon cycle so it should be a net neutral contribution.
The problem isn't the cycle, it's the delta of total emissions in the cycle. By your definition, literally everything is part of the carbon cycle, as we are just putting the carbon of old plants in the air, which will slowly be consumed by plants. The problem is if we put all that carbon in the air all at once, we have problems.
The biomass of livestock is 14x larger than all other mammal apart from humans[1], so it makes sense that, even if their carbon cycles is short, it's still a massive amount of effectively permanent GHG that exists in our atmosphere that wouldn't otherwise be there... about 15% of all emissions[2].
Not counting livestock as emissions because they form a decades long closed loop could be fine when we are carbon negative, but we are dealing with the very real problem of total emissions right now, not just unsustainable growth of emissions.
I think a big part of it is that we have so many more cattle than the earth could naturally support, and the number is only increasing as the world gets more developed. As it stands, even without any other sources of carbon emissions cattle would be enough to cause significant climate change on their own. [0]
I don't know if it's by gross weight or calories, but the number I've heard is that feeding plants to animals is 10-20 less effective than just having people eat the plants. Or mostly eat the plants. So, in the case of Denmark, where 50% of the surface area is used to grow food for pigs, we could instead use 3-5% of the surface are to grow food for people and come out at something resembling the same amount of food, at least if we're just talking "food needed to survive". And given some of the other talk we see on this site, (or used to see a few years ago) about how indoor farming is an absolute necessity because we're running out of land due to rising populations, I think that seems quite significant.
BTW, I am not a full-time vegan nor interested in becoming one, but the average meat consumption in Denmark is as far as I know measure in the hundreds of grams per day. Maybe there's room for compromise?
Cows don't eat plants that people can derive appropriate nutrition from. They also don't generally use land that is appropriate for crops. Also, from what I understand, the emissions from the animals isn't significantly different than seasonal die off from natural grasslands they graze on. Beyond this, most of the calculated water consumption "used" is rainwater on said grasslands.
Cows in America derive most of their calories from corn. And while most Montana cows generally live on land unsuitable for crops, Montana only has a small fraction of American cows.
And I believe that your comments are even less true in other prominent cattle producing countries than America.
Grass doesn't grow on hills, or soil with lots of stone/rocks that would be prohibitively expensive to turn into cropland? Not to mention that cropland using regenerative farming, or anything actually sustainable, should include grazing animal rotation.
I'm not a climate scientist, presumably the Danish government has consulted some of those though. My understanding is that livestock farming, in particularly beef and dairy cows, contribute significantly to the Co2e(carbon dioxide equivalent) emissions of the farming sector, which itself is a major overall contributor. The negative externality is this emission, which is not accounted for in the price of beef and dairy products
Given a constant population of cows [1], the cows do not cause an increase in green houses gasses over time.
This is because the methane from the cows has a half life of between six and eight years [2]. Given a fixed population cows, the amount of GHG going into the atmosphere is the same as the amount of GHG coming out of the atmosphere.
The problem with run away climate change is oil. Given a fixed consumption of oil, the amount of GHG in the atmosphere increases over time.
perhaps some kind of financial incentive to curb and eventually reverse the artificially inflated population of cows whose emissions, while part of the closed carbon cycle, increase the greenhouse gas effect of our atmosphere during their half lives, would be a simple and effective step we could take towards increasing the odds we survive the next few centuries that is not at all at odds with also tackling our reliance on fossil fuels?
This is a factual matter. Why be snarky when you can instead lookup the answer? The population of cows in the EU has been relatively constant with a slight downward trend. [1]
Maybe I’m using the wrong term but my understanding is that the carbon comes from the feed, which itself pulls it from the atmosphere. Thus, it came from the air, and goes back into the air.
The fossil fuels did the same but on a grander scale and longer timeline where the carbon becomes sequestered. Carbon taxes on that make sense.
From what I understand carbon comes from the soil as well as from the air, and it's supposed to be "stored" back into the soil by various means but our agriculture and more generally human activities tend to accelerate release of carbon from the soil and refrein the storing process. And that provokes a climate disimbalance.
There was a good video on the relation between carbon cycle and massive extinctions throuhout history (1h long) : https://www.youtube.com/watch?v=uxTO2w0fbB4
It has a short half-life (years) in the atmosphere though, while CO2 has a very long half-life (centuries or more). Methane in the atmosphere gets photochemically oxidized into CO2 and H2O, but more slowly than when it's combusted.
> I don't really know what you mean by it being part of carbon cycle
Plants breath in co2 from the atmosphere and bind the carbon in their structure. Cows eat the plants and turn some percentage of the bound carbon into cow meat, the rest they poop or fart or breath out. Eventually all that carbon ends up back in the atmosphere where plants again can bind it. This is the carbon cycle. Thus the point is that cows existing do not increase the carbon in circulation. (As opposed to digging up coal or drilling for oil/gas which liberates carbon which used to be in circulation but become bound in fossils.)
So we should just give up on trying to accurately price externalities until we’ve found a way to include all of them? (Hint: we never will, so that would mean never taking action)
No, but maybe don't start with one of the smaller sources of overall greenhouse gasses (it's like <3% in the U.S. for example, not sure about Denmark), especially when it affects nutrition and well being of your people.
> it's like <3% in the U.S. for example, not sure about Denmark
Sure, but TFA and this discussion is about Denmark, where "...agriculture is the country’s biggest source of emissions".
Further on in TFA, "The global food system is a huge contributor to the climate crisis, producing around a third of greenhouse gas emissions", of which, "Denmark is a major dairy and pork exporter".
Even per the EPA agriculture is ~10% for the US [0]. And you say "That's all agriculture", I say "Why do you think the US grows so much shite-tasting feed corn? Some for ethanol, lots for pigs, cows, and chicken".
Sure, but you are only looking at one potential impact of this. I can think of at least 3 benefits that are seem pretty clear without doing a deeper analysis.
1. Adding an appropriate carbon tax for gassy cows offsets reduces or eliminates the externalization of the environmental impact of the cows, and provides justification for spending on the increased cost of feeds that drive down the amount of methane produced.
2. Driving a justification for increased spent on those feeds, especially novel feeds, drives investment in new aquaculture techniques that will create additional jobs and ensures that capital that might be otherwise be held as profits actually circulates through the economy (a key component that is actually required for capitalism to work).
3. The increased costs of beef and pork will drive lower consumption of animal based proteins. That lower consumption should be a factor in driving better health outcomes, longer term (note I said lower, not elimination of consumption). This is a net benefit for any nation that provides health care services to it's citizens (heart disease and cardiovascular illness are among leading causes of death in Denmark, like most developed nations).
Cow farts don't cause climate change. Meanwhile ruminants are essential to ecosystems, have existed in the hundreds of millions for tens of millions of years and provide tons of positive externalities that aren't subsidized. Nevermind that they contribute to food security and health. Ruminants are also better for the land than millions of acres of GMO monocrops drenched in Roundup.
Ruminants, yes, corn/grain fed livestock, no. Sadly the days where most of your meat production is coming from cows on a family ranch grazing on acres of grassland is long since gone and most of our beef is coming from cows fed on those "millions of acres of GMO monocrops drenched in Roundup".
I come from a long line of Montana ranchers, I'm in no hurry to see beef production disappear, and it certainly won't be disappearing from my plate anytime soon, but it's important to acknowledge that the ranches of today look nothing like what my parents grew up on.
I also suspect that were you to draw a Venn diagram of people who want less meat consumption, and people who want less monocrops drenched in roundup, you'd discover you've drawn a circle. These viewpoints are not mutually exclusive.
The biomass of livestock is 14x larger than all other mammal, combined[1] (not counting humans). You have to be really intentional with the way you phrase things to point out that, yes, wild ruminants are essential to ecosystems.
Nobody is talking about wild ruminants though, we're talking about the 14x the biomass of all other mammal combined that are creating and obscene amount of emissions. And just FYI, it's the burps that are the main problem, not the farts.
> The biomass of livestock is 14x larger than all other mammal, combined[1] (not counting humans). You have to be really intentional with the way you phrase things to point out that, yes, wild ruminants are essential to ecosystems.
Livestock have supplanted the biomass of wild ruminants and ungulates. As an example, North America had at least 60 million wild bison for millenia before it had 90 million cows. And that's not even accounting for the megafauna that existed prior to the mass extinctions of the Pleistocene (mammoths, giant bison, ground sloths, tapirs, steppe bison, saiga antelopes, giant muskox, wooly rhinos, etc etc) What do you think occupied the Great Plains before they mowed it down for corn fields?
The world has a billion cows, some estimate as many as 1.5 billion, at we expect that to grow significantly as China and India become more developed. I honestly don't think anyone would be worried about cattle if the world cattle population were 90 million. It would be somewhere below 1% if emissions at that point. It's literally a factor of more than 10x.
It's 90 million in USA, that's not disinformation that's a fact, and what I'm underscoring by pointing to the 60 million wild bison population in the mid 1800s in USA is that there hasn't been a significant change in global non-human mammal biomass.
Eurasia alone had 200+ million wooly mammoths during the ice age (note that wooly mammoths have 10x more mass than cows), and there were hundred of millions more megafauna with similar digestive systems for tens of millions of years.
At no point in the last 50 million years did any number of those species trigger large scale climate change. It probably would have been welcome in the midst of the ice age, to be honest.
Nobody is saying that cattle are the sole cause of climate change. Suggesting that is a non sequitur.
They are simply another artificial source of GHGs that are contributing to climate change. That's why you're statements are disinformation. The intention is to somehow equate the natural GHGs from species which arise very slowly and allow a balance to be maintained, generally, yes, in a cycle.
The point isn't that cows are a problem. It's that we've created a whole bunch of cows, very rapidly, without any corresponding plant life to offset the excess emissions they produce. While extremely unlikely, this could absolutely happen in a natural system, and it could still cause climate change if it did.
The problem is total GHGs in the atmosphere right now, of which livestock is a significant contribution.
> Nobody is saying that cattle are the sole cause of climate change.
I'm asserting they have zero effect on climate change, they have merely supplanted wild biomass that existed on a much broader scale for tens of millions of years and there is no evidence that an abundance of mammalian digestion has ever caused climate change in the 50+ million years that mammals have dominated life on earth.
> The point isn't that cows are a problem. It's that we've created a whole bunch of cows, very rapidly, without any corresponding plant life to offset the excess emissions they produce.
The only thing we've done is supplant wild mammals with domesticated mammals. In the absence of agriculture or even humans, mammals already dominated the planet.
As I referenced, wooly mammoths on one continent alone had a higher biomass than all the cows alive on the planet today. That is a completely extinct species, and there's hundreds of more extinct species where that came from:
You'll have to explain to me how it was that the Pleistocene featured such a high biomass of ungulates without a corresponding increase in temperatures.
>I'm asserting they have zero effect on climate change
This argument makes zero sense.
We know that the cattle produce methane via digestion. We know that methane is released into the atmosphere. We know that methane is a greenhouse gas. Thus, we know that these cattle, cattle that would otherwise not exist, are contributing to climate change.
This is a trivially demonstrable argument. The idea that you doubt it means that you are, at best, somehow deeply confused the relationship between GHG emission and climate change in general.
It makes perfect sense: the livestock methane has merely supplanted the wildlife methane that existed for 50 million years. In 50 million years of geological analysis on climate change, we have not documented one case where biological methane has triggered large scale climate change.
> Thus, we know that these cattle, cattle that would otherwise not exist, are contributing to climate change.
In the absence of cattle and especially in the absence of humans, other ruminants will naturally propagate, as they have a number of symbiotic relationships with various plant and animal species.
Ruminants have roamed the earth for 50+ million years, and they have been widely propagated in the hundreds of millions to billions of total global population for that entire time.
The Great Plains were filled with 60+ million American bison, a species which trends larger than cows themselves, but which is still so close genetically to cows that they can still breed together. Deer, antelope, elk, moose, sheep, goats, etc also have similar digestive systems and also existed in larger numbers in the wild prior to modern human expansion.
Who said they cause climate change? Is the impact of domesticated cattle a measurable impact on global methane production? Yes. This has been measured, observed, and accepted science since at least 1995[1]. Does methane have an observable, measurable effect on climate? Yes[2], and agriculture is a major driver for anthropogenic climate change[3]. If you want to argue it doesn't or isn't, bring data, because there are too many articles, research papers, and studies arguing that it does that have for your unbacked opinion to be accepted.
Does this mean that cow farts (or burps) are the cause or driver of climate change? No. Is it something that we can meaningfully reduce the impact of? Well, probably, based on some of the resources linked to in other comments.
Does it mean that solving this for cattle is going to solve climate change? No, but incremental progress helps (insert 1.01% effort per day over a year = 37.8 meme).
You know what doesn't help? Clearly ignorant reductive comments that conflate a contributing factor with the entire problem while ignoring the preponderance of evidence contrary to your claims.
The problem with the assertion that cattle methane has had a meaningful impact is that it's not measured against any historical control. We have a roughly 50-60 million year period to compare against where mammals have been the dominant kingdom of animals on planet earth [1]. Their digestive systems have not meaningfully changed. Livestock have merely supplanted biomass that would otherwise be wild biomass, producing methane just the same (though perhaps distributed over a more diverse spectrum of species).
Domesticated cattle occupy land that was occupied by wild ruminants long before it was fenced in. In the absence of monocrop agriculture and human dwellings, hundreds of millions of more acres would be occupied by ruminant mammals. The Great Plains, where we today grow massive amounts of corn, wheat and soybeans, were occupied by massive quantities of bison, elk, and deer. And before the mass extinctions of the Late Pleistocene, the earth was massively occupied by wooly mammoths, giant ground sloths, giant bison, muskox, shrub ox, stag moose, stout legged llamas, etc etc [2]. The Eurasian biomass of wooly mammoths alone was more than the global biomass of all domesticated cattle alive today [3] (assuming 200 million mammoths 50K YA in Eurasia and 1 billion global cattle population today, with mammoths averaging 10x more mass than cows).
It might be the case that human activity actually reduced the amount of methane produced globally by four-legged ungulates (although surely we've more than made up for the difference from mining and drilling).
EDIT: On further review, I found a study that makes the attempt to estimate what pre-European U.S. settlement methane production attributed to ruminants was and estimates it to be 86% of what modern day livestock and wild ruminant methane production, underscoring my point (livestock ruminants merely supplanted wild ruminants):
That would be an acceptable outcome. Currently red meat is priced lower than it should be. If we want a market based solution to climate change we need to accurately price goods based on the damage they do. We have been discounting externalities for too long, and it's catching up with us.
> Taxes on producers inevitably get passed on to consumers.
And just like any tax on consumption, it would have an outsized effect on lower income consumers. That’s not necessarily wrong or unintended, but it’s the reality of any program like this.
If this were CA, there would be some cockamamie “low income meat consumer tax credit” proposed alongside. And then even if it’s added, it’ll get removed after a few years so just the tax remains.
And those priced out people are voting for right wing parties promising to cancel those taxes. Those right wing parties might not do that, but people on the left should not be surprised why EU is leaning more and more to the right.
Agreed. It's well-established that protein from greens isn't as bioavailable as protein from meat. Hard not to worry about negatively impacting future generations of lower-income families.
It will also undoubtably put many farmers out of business and force more consolidation within the industry to the largest players who can afford to deal with the extra taxes.
They will import those meat and diary products that won't be competitive enough anymore to be produced in Danmark. This kind of taxes work better if implemented uniformly in all the EU, but then they'll import from outside the EU. Ultimately it's a way to tell some industries to pack their things and go away.
> Taxes on producers inevitably get passed on to consumers.
Which raises the cost of living in your country.
Which reduces the quality of life of your citizens
And reduces the competitiveness of your exports on the global markets.
I'm all for protecting the global environment, but capitalism puts concrete boots on the people of any country who spends any money on it whilst other countries do not.
A global binding agreement, backed by sanctions, is the only approach that will work.
Isn’t it true that the only thing that LLM’s do is “hallucinate”?
The only way to know if it did “hallucinate” is to already know the correct answer. If you can make a system that knows when an answer is right or not, you no longer need the LLM!
Hallucination implies a failure of an otherwise sound mind. What current LLMs do is better described as bullshitting. As the bullshitting improves, it happens to be correct a greater and greater percentage of the time
Sometimes when I am narrating a story I don't care that much about trivial details but focus on the connection between those details. Is there LLM counterpart to such a behaviour? In this case, one can say I was bullshitting on the trivial details.
It has nothing to do with ratio and to do with intent. Bullshitting is what we say you do when you just spin a story with no care for the truth, just make up stuff that sound plausible. That is what LLMs do today, and what they will always do as long as we don't train them to care about the truth.
You can have a generative model that cares about the truth when it tries to generate responses, its just the current LLMs don't.
You can program a concept of truth into them, or maybe punishing it for making mistakes instead of just rewarding it for replicating text. Nobody knows how to do that in a way that get intelligent results today, but we know how to code things that outputs or checks truths in other contexts, like wolfram alpha is capable of solving tons of things and isn't wrong.
> (or any concepts at all).
Nobody here said that, that is your interpretation. Not everyone who is skeptical of current LLM architectures future potential as AGI thinks that computers are unable to solve these things. Most here who argues against LLM don't think the problems are unsolvable, just not solvable by the current style of LLMs.
> You can program a concept of truth into them, ...
The question was, how you do that?
> Nobody here said that, that is your interpretation.
What is my interpretation?
I don't think that the problems are unsolvable, but we don't know how to do it now. Thinking that "just program the truth in them" shows a lack of understanding of the magnitude of the problem.
Personally I'm convinced that we'll never reach any kind of AGI with LLM. They are lacking any kind of model about the world that can be used to reason about. And the concept of reasoning.
And I answered, we don't know how you do that which is why we don't currently.
> Personally I'm convinced that we'll never reach any kind of AGI with LLM. They are lacking any kind of model about the world that can be used to reason about. And the concept of reasoning.
Well, for some definition of LLM we probably could. But probably not the way they are architected today. There is nothing stopping a large language model to add different things to its training steps to enable new reasoning.
> What is my interpretation?
Well, I read your post as being on the other side. I believe it is possible to make a model that can reason about truthiness, but I don't think current style LLMs will lead there. I don't know exactly what will take us there, but I wouldn't rule out an alternate way to train LLMs that looks more like how we teach students in school.
Key words like "epistemology" in the prompt. Chat GPT generally outperforms humans in epistemology substantially in my experience, and it seems to "understand" the concept much more clearly and deeply, and without aversion (lack of an ego or sense of self, values, goals, desires, etc?).
> It has nothing to do with ratio and to do with intent. Bullshitting is what we say you do when you just spin a story with no care for the truth, just make up stuff that sound plausible
Do you people hear yourselves? You're discussing the state of mind of a pseudo-RNG...
ML models intent is the reward function it has. They strive to maximize rewards, just like a human does. There is nothing strange about this.
Humans are much more complex than these models so they have much more concepts and stuff which is why we need psychology. But some core aspects works the same in ML and in human thinking. In those cases it is helpful to use the same terminology for humans and machine learning models, because that helps transfer understanding from one domain to the other.
Does every thread about this topic have to have someone quibbling about the word “hallucination”, which is already an established term of art with a well understood meaning? It’s getting exhausting.
The term hallucination is a fundamental misunderstanding of how LLMs work, and continuing to use it will ultimately result in a confused picture of what AI and AGI are and what is "actually happening" under the hood.
Wanting to use accurate language isn't exhausting, it's a requirement if you want to think about and discuss problems clearly.
"Arguing about semantics" implies that there is no real difference between calling something A vs. calling it B.
I don't think that's the case here: there is a very real difference between describing something with a model that implies one (false) thing vs. a model that doesn't have that flaw.
If you don't find that convincing, then consider this: by taking the time to properly define things at the beginning, you'll save yourself a ton of time later on down the line – as you don't need to untangle the mess that resulted from being sloppy with definitions at the start.
This is all a long way of saying that aiming to clarify your thoughts is not the same as arguing pointlessly over definitions.
"Computer" used to mean the job done by a human being. We chose to use the meaning to refer to machines that did similar tasks. Nobody quibbles about it any more.
Words can mean more than one thing. And sometimes the new meaning is significantly different but once everyone accepts it, there's no confusion.
You're arguing that we shouldn't accept the new meaning - not that "it doesn't mean that" (because that's not how language works).
I think it's fine - we'll get used to it and it's close enough as a metaphor to work.
I'd be willing to bet that people did quibble about what "computer" meant at the time the meaning was transitioning.
It feels like you're assuming that we're already 60 years past re-defining "hallucination" and the consensus is established, but the fact that people are quibbling about it right now is a sign that the definition is currently in transition/ has not reached consensus.
What value is there in trying to shut down the consensus-seeking discussion that gave us "computer"? The same logic could be used arguing that "computers" are actually be called "calculators" and why are people still trying to call it a "computer"?
you stole a term which means something else in an established domain and now assert that the ship has sailed, whereas a perfectly valid term in both domains exists. don't be a lazy smartass.
That's actually what the paper is about. I don't know why they didn't use that in the title.
> Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations—confabulations—which are arbitrary and incorrect generations.
If there's any forum which can influence a more correct name for a concept it's this one, so please excuse me while I try to point out that contemporary LLMs confabulate and hallucinating should be reserved for more capable models.
It’s well understood in the field. It’s not well understood by laymen. This is not a problem that people working in the field need to address in their literature.
> We need systems that try to be coherent, not systems that try to be unequivocally right, which wouldn't be possible.
The fact that it isn't possible to be right about 100% of things doesn't mean that you shouldn't try to be right.
Humans generally try to be right, these models don't, that is a massive difference you can't ignore. The fact that humans often fails to be right doesn't mean that these models shouldn't even try to be right.
By their nature, the models don’t ‘try’ to do anything at all—they’re just weights applied during inference, and the semantic features that are most prevalent in the training set will be most likely to be asserted as truth.
They are trained to predict next word that is similar to the text they have seen, I call that what they "try" to do here. A chess AI tries to win since that is what it was encouraged to do during training, current LLM try to predict the next word since that is what they are trained to do, there is nothing wrong using that word.
This is an accurate usage of try, ML models at their core tries to maximize a score, so what that score represents is what they try to do. And there is no concept of truth in LLM training, just sequences of words, they have no score for true or false.
Edit: Humans are punished as kids for being wrong all throughout school and in most homes, that makes human try to be right. That is very different from these models that are just rewarded for mimicking regardless if it is right or wrong.
> That is very different from these models that are just rewarded for mimicking regardless if it is right or wrong
That's not a totally accurate characterization. The base models are just trained to predict plausible text, but then the models are fine-tuned on instruct or chat training data that encourages a certain "attitude" and correctness. It's far from perfect, but an attempt is certainly made to train them to be right.
They are trained to replicate text semantically and then given a lot of correct statements to replicate, that is very different from being trained to be correct. That makes them more useful and less incorrect, but they still don't have a concept of correctness trained into them.
Exactly, if a massive data poisoning would happen, will the AI be able to know what’s the truth is there is as much new false information than there is real one ? It won’t be able to reason about it
I think this assumption is wrong, and it's making it difficult for people to tackle this problem, because people do not, in general, produce writing with the goal of producing truthful statements. They try to score rhetorical points, they try to _appear smart_, they sometimes intentionally lie because it benefits them for so many reasons, etc. Almost all human writing is full of a range of falsehooods ranging from unintentional misstatements of fact to out-and-out deceptions. Like forget the politically-fraught topic of journalism and just look at the writing produced in the course of doing business -- everything from PR statements down to jira tickets is full of bullshit.
Any system that is capable of finding "hallucinations" or "confabulations" in ai generated text in general should also be capable of finding them in human produced text, which is probably an insolvable problem.
I do think that since the models do have some internal representation of certitude about facts,that the smaller problem of finding potential incorrect statements in its own produced text based on what it knows about the world _is_ possible, though.
The answer is no, otherwise this paper couldn't exist. Just because you can't draw a hard category boundary doesn't mean "hallucination" isn't a coherent concept.
(the OP is referring to one of the foundational concepts relating to the entropy of a model of a distribution of things -- it's not the same terminology that I would use but the "you have to know everything and the model wouldn't really be useful" is why I didn't end up reading the paper after skimming a bit to see if they addressed it.
It's why this arena things are a hard problem. It's extremely difficult to actually know the entropy of certain meanings of words, phrases, etc, without a comical amount of computation.
This is also why a lot of the interpretability methods people use these days have some difficult and effectively permanent challenges inherent to them. Not that they're useless, but I personally feel they are dangerous if used without knowledge of the class of side effects that comes with them.)
The idea behind this research is to generate answer few times and if results are semantically vastly different from each other then probably they are wrong.
> Isn’t it true that the only thing that LLM’s do is “hallucinate”?
The Boolean answer to that is "yes".
But if Boolean logic were a god representation of reality, we would already have solved that AGI thing ages ago. On practice, your neural network is trained with a lot of samples, that have some relation between themselves, and to the extent that those relations are predictable, the NN can be perfectly able to predict similar ones.
There's an entire discipline about testing NNs to see how well they predict things. It's the other side of the coin of training them.
Then we get to this "know the correct answer" part. If the answer to a question was predictable from the question words, nobody would ask it. So yes, it's a definitive property of NNs that they can't create answers for questions like people have been asking those LLMs.
However, they do have an internal Q&A database they were trained on. Except that the current architecture can not know if an answer comes from the database either. So, it is possible to force them into giving useful answers, but currently they don't.
————- Learn more about [the browser]
Never hear about [the browser] again
Those links will do very different things.