Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I strongly suspect Google tried really, really hard here to overcome the criticism is got with previous image recognition models saying that black people looked like gorillas. I am not really sure what I would want out of an image generation system, but I think Google's system probably went too far in trying to incorporate diversity in image generation.


Surely there is a middle ground.

"Generate a scene of a group of friends enjoying lunch in the park." -> Totally expect racial and gender diversity in the output.

"Generate a scene of 17th century kings of Scotland playing golf." -> The result should not be a bunch of black men and Asian women dressed up as Scottish kings, it should be a bunch of white guys.


> Surely there is a middle ground. "Generate a scene of a group of friends enjoying lunch in the park." -> Totally expect racial and gender diversity in the output.

Do we expect this because diverse groups are realistically most common or because we wish that they were? For example only some 10% of marriages are interracial, but commercials on TV would lead you to believe it’s 30% or higher. The goal for commercials of course is to appeal to a wide audience without alienating anyone, not to reflect real world stats.

What’s the goal for an image generator or a search engine? Depends who is using it and for what, so you can’t ever make everyone happy with one system unless you expose lots of control surface toggles. Those toggles could help users “own” output more, but generally companies wouldn’t want to expose them because it could shed light on proprietary backends, or just take away the magic from interacting with the electric oracles.


Also, these tools are used world-wide and "diversity" means different things in different places. Somehow it's always only the US ideal of diversity that gets shipped abroad.


Yeah, someone else mentioned Tokyo which is not going to have as much variety among park visitors as NYC. But then again neither will Colorado (or almost anywhere else!) be as diverse. Some genius at corporate is probably scheming about making image generation as location-sensitive as search is, ostensibly to provide utility but really to perpetuate echo chambers and search bubbles. I wish computing in general would move back towards user-controlled rather than guess-what-I-mean and the resulting politicization, but it seems that ship has sailed.


US companies systematically push US cultural beliefs and expectations. People in the US probably don’t notice it any more, but it’s pretty obvious from those of us on the receiving end of US cultural domination.

This fact is an unavoidable consequence of the socioeconomic realities of the world, but it obviously clashes with these companies’ public statements and positions.


Yes but it's especially cynical in this case because the belief their pushing is that diversity matters, that biases need to be overcome, and that all people need to be represented.

Claiming all of that but then shoving your own biases down the rest of the world's throat while not representing their people in any way is especially cynical in my opinion. It undermines the whole thing.


This is the exact same mindset that invented the word “Latinx”. Compress most of an entire hemisphere of ethnic and cultural diversity down to the vague concept of “Latino”, notice that Spanish is a gendered language so the word “Latino” is also gendered, completely forget that you already have the gender neutral English word “Latin”, invent a new word that virtually none of the people to whom it applies actually identifies with, and then shamelessly use it all the time.


Hypocrisy is an especially destructive kind of betrayal, which is why the crooked cop or the pedo priest are so disappointing. Would be nice if companies would merely exploit us without all the extra insult of telling us they are changing the world / it’s for our own good / it’s what we asked for/, etc


And most women are friends with mostly women, and most men are friends with mostly men.


19% of new marriages in 2019 (and likely to rise): https://en.m.wikipedia.org/wiki/Interracial_marriage_in_the_...

Plus, it’s still a recent change: Loving v Virginia (legalized interracial marriage across US) was decided in 1967.


You can see how this gets challenging, though, right?

If you train your model to prioritize real photos (as they're often more accurate representations than artistic ones), you might wind up with Denzel Washington as the archetype; https://en.wikipedia.org/wiki/The_Tragedy_of_Macbeth_(2021_f....

There's a vast gap between human understanding and what LLMs "understand".


If they actually want it to work as intelligently as possible, they'll begin taking these complaints into consideration and building in a wisdom curating feature where people can contribute.

This much is obvious, but they seem to be satisfied with theory over practicality.

Anyway I'm just ranting b/c they haven't paid me.

How about an off the wall algorithm to estimate how much each scraped input turns out to influence the bigger picture, as a way to work towards satisfying the copyright question.


An LLM-style system designed to understand Wikipedia relevance and citation criteria and apply them might be a start.

Not that Wikipedia is perfect and controversy-free, but it's certainly a more sophisticated approach than the current system prompts.


Then who in this black box private company is the Oracle of infinite wisdom and truth!? Who are you putting in charge? Can I get a vote?


> If you train your model to prioritize real photos

I thought that was the big bugbear about disinformation and false news, but now we have to censor reality to combat "bias"


I mean now you d to train AI to recognise the bias in the training data.


The focus for alignment is to avoid bad PR specifically the kind of headlines written by major media houses like NYT, WSJ, WaPo. You could imagine the headlines like "Google's AI produced a non-diverse output on occasions" when a researcher/journalist is trying too hard to get the model to produce that. The hit on Google is far bigger than say on Midjourney or even Open AI till now (I suspect future models will be more nerfed than what they are now)

For the cases you mentioned, initially those were the examples. It gets tricky during red teaming where they internally try out extreme prompts and then align the model for any kind of prompt which has a suspect output. You train the model first, then figure out the issues, and align the model using "correct" examples to fix those issues. They either went to extreme levels doing that or did not test it on initial correct prompts post alignment.


There's no "middle" in the field of decompressing a short phrase into a visual scene (or program or book or whatever). There are countless private, implicit assumptions that users take for granted yet expect to see in the output, and vendors currently fear that their brand will be on the hook for the AI making a bad bet about those assumptions.

So for your first example, you totally expect racial and gender diversity in the output because you're assuming a realistic, contemporary, cosmopolitan, bourgeoisie setting -- either because you live in one or because you anticipate that the provider will default to one. The food will probably look Western, the friends will probably be young adults that look to have professional or service jobs wearing generic contemporary commercial fashion, the flora in in the park will be broadly northern climate, etc.

Most people around the world don't live in an environment anything like that, so nominal accuracy can't be what you're looking for. What you want, but don't say, is a scene that feels familiar to you and matches what you see as the de facto cultural ideal of contemporary Western society.

And conveniently, because a lot of the training data is already biased towards that society and the AI vendors know that the people who live in that society will be their most loyal customers and most dangerous critics right now, it's natural for them to put a thumb on the scale (through training, hidden prompts, etc) that gets the model to assume an innocuous Western-media-palatable middle ground -- so it delivers the racially and gender diverse middle class picnic in a generic US city park.

But then in your second example, you're implicitly asking for something historically accurate without actually saying that accuracy is what's become important for you in this new prompt. So the same thumb that biased your first prompt towards a globally-rare-but-customer-palatable contemporary, cosmopolitan, Western culture suddenly makes your new prompt produce something surreal and absurd.

There's no "middle" there because the problem is really in the unstated assumptions that we all carry into how we use these tools. It's more effective for them to make the default output Western-media-palatable and historical or cultural accuracy the exception that needs more explicit prompting.

If they're lucky, they may keep grinding on new training techniques and prompts that get more assumptions "right" by the people that matter to their success while still being inoffensive, but it's no simple "surely a middle ground" problem.


> "Generate a scene of 17th century kings of Scotland playing golf." -> The result should not be a bunch of black men and Asian women dressed up as Scottish kings, it should be a bunch of white guys.

It works in bing, at least:

https://www.bing.com/images/create/a-picture-of-some-17th-ce...


I don't know that this sheds light on anything but I was curious...

a picture of some 21st century scottish kings playing golf (all white)

https://www.bing.com/images/create/a-picture-of-some-21st-ce...

a picture of some 22nd century scottish kings playing golf (all white)

https://www.bing.com/images/create/a-picture-of-some-22nd-ce...

a picture of some 23rd century scottish kings playing golf (all white)

https://www.bing.com/images/create/a-picture-of-some-23rd-ce...

a picture of some contemporary scottish people playing golf (all white men and women)

https://www.bing.com/images/create/a-picture-of-some-contemp...

https://www.bing.com/images/create/a-picture-of-some-contemp...

a picture of futuristic scottish people playing golf in the future (all white men and women, with the emergence of the first diversity in Scotland in millennia! Male and female post-human golfers. Hummmpph!)

https://www.bing.com/images/create/a-picture-of-futuristic-s...

https://www.bing.com/images/create/a-picture-of-futuristic-s...

Inductive learning is inherently a bias/perspective absorbing algorithm. But tuning in a default bias towards diversity for contemporary, futuristic and time agnostic settings seems like a sensible thing to do. People can explicitly override the sensible defaults as necessary, i.e. for nazi zombie android apocalypses, or the royalty of a future Earth run by Chinese overlords (Chung Kuo), etc.


> People can explicitly override the sensible defaults as necessary

They cannot, actually. If you look at some of the examples in the Twitter thread and other threads linked from it, Gemini will mostly straight up refuse requests like e.g. "chinese male", and give you a lecture on why you're holding it wrong.


Two weeks ago I tried the following prompts, and I was very surprised by the "diversity" of my dark age soldiers:

https://www.bing.com/images/create/selfy-of-a-group-of-dark-...

https://www.bing.com/images/create/selfy-of-a-group-of-dark-...


Diversity is cool, but who gets to decide what's diverse?


Good point. People of European descent have more diversity in hair color, hair texture and eye color than any other race. That’s because a lot of those traits are recessive and are only expressed in isolated gene pools (European peoples are an isolated gene pool in this sense).


Isn't this like the exact opposite of the conclusions of the HapMap project?


I'm really disappointed that nth-century seems to have no effect at all. I'm expecting Kilts in Space.


It’s a perfect illustration of the way these models work. They are fundamentally incapable of original creation and imagination, they can only regurgitate what they have already been fed.


That they can do more than simplfy recall is easily demonstrated.

Simply ask a GPT to explain a write a pleading to the Supreme Court for the constitutional recognition that the environment is a common owned inheritance and so any citizen can sue any polluter, in the prose of Dr. Seuss.

Likewise, imagines of knights in space demonstrate the same kind of creativity.

Being able to combine previously uncorrelated/unrelated topics, is an important type of creativity. And GPT4 does this all the time. It would be interesting to list the types of creativity and rate GPT on each one.

So it is not that these models are not creative. It is just that their creative abilities are not universal yet.

Similarly for the depth of their logic. They often reason, but their reasoning depth is limited.

And they often incorporate relevant facts without explicit mention, but not always. Etc.


It works if you say "futuristic looking".

a picture of some futuristic-looking 23rd century scottish kings playing golf:

https://www.bing.com/images/create/a-picture-of-some-futuris...


Why would you expect anything you didn't specify in the output of the first prompt? If there are friends, lunch, and a park: it did what you asked.

Piling a bunch of neurotic expectations about it being a Benneton ad on top of that is absurd. When you can trivially add as much content to the description as you want, and get what you ask for, it does not matter what the default happens to be.


> "Generate a scene of a group of friends enjoying lunch in the park." -> Totally expect racial and gender diversity in the output.

I'd err on the side of "not unexpected". A group of friends in a park in Tokyo is probably not very diverse, but it's not outside of the realm of possibility. Only white men were golfing Scottish kings if we're talking strictly about reality and reflecting it properly.


Quite reminded of this episode: https://www.eurogamer.net/kingdom-come-deliverance-review (black representation in a video game about 15th century Bohemia; it was quite the controversy)


It feels like it wouldn't even be that hard to incorporate into LLM instructions (aside from using up tokens), by way of a flowchart like "if no specific historical or societal context is given for the instructions, assume idealized situation X; otherwise, use historical or projected demographic data to do Y, and include a brief explanatory note of demographics if the result would be unexpected for the user". (That last part for situations with genuine but unexpected diversity; for example, historical cowboys tending much more towards non-white people than pop culture would have one believe.)

Of course, now that I've said "it seems obvious" I'm wondering what unexpected technical hurdles there are here that I haven't thought of.


>"Generate a scene of 17th century kings of Scotland playing golf." -> The result should not be a bunch of black men and Asian women dressed up as Scottish kings, it should be a bunch of white guys.

is black man in the role of the Scottish king represents a bigger error than some other errors in such an image, like say incorrect dress details or the landscape having say a wrong hill? I'd venture a guess that only our racially charged mentality of today considers that a big error, and may be in a generation or 2 an incorrect landscape or dress detail would be considered much larger error than a mismatched race.


As soon as you have then playing an anachronistic sport you should expect other anachronistic imagery to creep in, to be fair.


https://en.wikipedia.org/wiki/Golf

> The modern game of golf originated in 15th century Scotland.


Oh fair enough then.


> anachronistic sport

Scottish kings absolutely played golf.


Judging by the way it words some of the responses to those queries, they "fixed" it by forcibly injecting something like "diverse image showcasing a variety of ethnicities and genders" in all prompts that are classified as "people".


They have now added a strong bias for generating black people now. Some have prompted to generate a picture of a German WW2 soldier, and now there are many pictures of black people floating around in NAZI uniforms.

I think their strategy to "enhance" outcomes is very misdirected.

The most widely used base models to really fine tune models are those that are not censored and I think you have to construct a problem to find one here. Of course AI won't generate a perfect world, but this is something that will probably only get better with time when users are able to adapt models to their liking.


> ...when users are able to adapt models to their liking.

Therein lies the rub, as it were, because the large providers of AI models are working hard to ensure legislation that wouldn't allow people access to uncensored models in the name of "safety." And "safety" in this case includes the notion that models may not push the "correct" world-view enough.


I remember checking like a year ago and they still had the word "gorilla" blacklisted (i.e. it never returns anything even if you have gorilla images).


Gotta love such a high quality fix. When your upper high tech, state of the art algorithm learns racist patterns just blocklist the word and move on. Don't worry about why it learned such patterns in the first place.


Humans do look like gorillas. We're related. It's natural that an imperfect program that deals with images will will mistake the two.

Humans, unfortunately, are offended if you imply they look like gorillas.

What's a good fix? Human sensitivity is arbitrary, so the fix is going to tend to be arbitrary too.


A good fix would, in my opinion, understanding how the algorithm is actually categorizing and why it miss-recognized gorillas and humans.

If the algorithm doesn't work well they have problems to solve.


But this is not an algorithm. It's a trained neural network which is practically a black box. The best they can do is train it on different data sets, but that's impractical.


That's exactly the problem I was trying to reference. The algorithms and data models are black boxes - we don't know wat they learned or why they learned it. That setup can't be intentionally fixed, and more importantly we wouldn't know if it was fixed because we can only validate input/output pairs.


It's too costly to potentially make that mistake again. So the solution guarantees it will never happen again.


You do understand that this has nothing to humans in general right? This isn't AI recognizing some evolutionary pattern and drawing comparisons to humans and primates -- it's racist content that specifically targets black people that is present in the training data.


I don't know nearly enough about the inner workings of their algorithm to make that assumption.

The internet is surely full of racist photos that could teach the algorithm. The algorithm could also have bugs that miss-categorize the data.

The real problem is that those building and managing the algorithm don't fully know how it works or, more importantly, what it had learned. If they did the algorithm would be fixed without a term blocklist.


Nope. This is due to a past controversy about image search: https://www.nytimes.com/2023/05/22/technology/ai-photo-label...


Where can I learn about this?


[flagged]


Do we have enough info for to say that decisively?

Ideally we would see the training data, though its probably reasonable to assume a random collection of internet content includes racist imagery. My understanding, though, is that the algorithm and the model of data learned is still a black box that people can't parse and understand.

How would we know for sure racist output is due to the racist input, rather than a side effect of some part of the training or querying algorithms?


It's not unavoidable, but it would cost more to produce high quality training data.


Yes, somewhat unavoidable.


As well as that, I suspect the major AI companies are fearful of generating images of real people - presumably not wanting to be involved with people generating fake images of "Donald Trump rescuing wildfire victims" or "Donald Trump fighting cops".

Their efforts to add diversity would have been a lot more subtle if, when you asked for images of "British Politician" the images were recognisably Rishi Sunak, Liz Truss, Kwasi Kwarteng, Boris Johnson, Theresa May, and Tony Blair.

That would provide diversity while also being firmly grounded in reality.

The current attempts at being diverse and simultaneously trying not to resemble any real person seems to produce some wild results.


My takeaway from all of this is that alignment tech is currently quite primitive and relies on very heavy-handed band-aids.


I think that's a bit overly charitable.

Would it not be reasonable to also draw the conclusion that notion of alignment itself is flawed?


We're honestly just seeing generative algorithms failing at diversity initiatives as badly as humans for.

Forcing diversity into a system is an extremely tough, if not impossible, challenge. Initiatives have to be driven my goals and metrics, meaning we have to boil diversity down to a specific list of quantitative metrics. Things will always be missed when our best tool to tackle a moral or noble goal is to boil a complex spectrum of qualitative data to a subset of measurable numbers.


Remind yourself we're discussing censorship, misinformation, inability to define or source truth and we're concerned on Day 1 about the results of image gen being controlled by a for profit single entity with incentives that focus solely on business and not humanity...

Where do we go from here? Things will magically get better on their own? Businesses will align with humanity and morals, not their investors?

This is the tip of the iceberg of concerns and it's ignored as a bug in the code not a problem with trusting private companies with defining truth.


The ridiculous degree of PC alignment of corporate models is the thing that's going to let open source win. Few people use bing/dall-e, but if OpenAI had made dall-e more available and hadn't put ridiculous guardrails on it, stable diffusion would be a footnote at this point. Instead, dall-e is a joke and people who make art use stable diffusion, with casuals who just want some pretty looking pictures using midjourney.


Don’t count out Adobe Firefly. I wouldn’t be surprised if it’s used more than all the other image gen models combined.


That might be true, but if you're using firefly as a juiced up content-aware fill in photoshop I'm not sure it's apples to apples.


No, ignoring laws and stealing data to increase your Castle's MOAT is the win. Compute isn't an open source solvable problem. I can't DirtyPCB's an A100

Making the argument open source is the answer is an agenda of making your competition spin wheels.


You're on a thread about how people are lambasting big money AI for being garbage, and producing inferior results to OSS tools you can run on consumer GPUS, tell me again how unbeatable google/other big tech players are.


I've been part of the advertising and marketing world for a lot of these companies for a decade plus, I've helped them sell bullshit. I've also been at the start of the AI journey, I've downloaded and checked local models of all promises and variances.

To say they're better than the compute that OpenAI or Google are throwing at the problem is just plain wrong.

I left the ad industry the moment I realised my skills and talents are better used informing people than lying to them.

This thread is not at all comparing the ethical issues of AI with local anything. You're conflating your solution with another problem.


Is a $5 can opener better than a $2000 telescope at opening cans? Yes. Is stable diffusion better at producing finished art, by virtue of not being closed off and DEI'd to oblivion so that it can actually be incorporated into workflows? Emphatically yes.

It doesn't matter how fancy your engineering is and how much money you have if you're too stupid to build the right product.

As for this being written nonsense, that's the sort of thing someone who couldn't find an easy way to win an argument and was bitter about the fact would say.


[flagged]


I understood it perfectly.


> Where do we go from here?

opensource models and training sets. So basically the "secret sauce" minus the hardware. I don't see it happening voluntarily.


I see it as not unlikely that there'll be a campaign to sigmatize, if not outright ban, open source models on the grounds of "safety". I'm quite surprised at how relatively unimpeded the distribution of image generation models has been, so far


What you're predicting has already started. Two weeks ago Geoffrey Hinton gave a speech advocating banning open source AI models (see e.g.: https://thelogic.co/news/ai-cant-be-slowed-down-hinton-says-... ).

I'm surprised there wasn't an HN thread about it at the time.


This is already happening, actually, although the focus so far has been on the possibility of their use for CSAM:

https://www.theguardian.com/society/2023/sep/12/paedophiles-...


Absolutely it won't. We've armed the issue with a supersonic jet engine and we're assuming if we build a slingshot out of pop sticks we'll somehow catch up and knock it off course.


i can't predict the future but there is precedent. Models, weights, and dataasets are the keys to the kingdom like operating system kernels, databases, and libraries use to be. At some point, enough people decided to re-invent and release these things or functionality to all that it became a self-sustaining community and, eventually, transformative to daily life. On the other hand, there may be enough people in power who came from and understand that community to make sure it never happens again.


No, compute is keys to the kingdom. The rest are assets and ammunition. You out-compute your enemy, you out-compute your competition. That's the race. The data is part of the problem, not the root.

These companies are silo'ing the worlds resources. GPU, Finance, Information. Those combined are the weapon. You make your competition starve in the dust.

These companies are pure evil pushing an agenda of pure evil. OpenAI is closed. Google is Google. We're like, ok, there you go! Take it all. No accountability, no transparency, we trust you.


Compute lets you fuck up a lot when trying to build a model, but you need data to do anything worth fucking up in the first place, and if you have 20% of the compute but you fuck up 1/5th as much you're doing fine.

Meta/OpenAI/Google can fuck up a lot because of all their compute, but ultimately we learn from that as the scientists doing the research at those companies would instantly bail if they couldn't publish papers on their techniques to show how clever they are.


I never said each of these exist in a vacuum. It is the collation of all that is the danger. This isn't democratic. This is companies now toying with governmental ideologies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: