@lars this whole thing is super interesting. It sounds like the strange behavior is more in diffusion models (Dall-E) than language models (GPT4). it reminds me of that paper that found that LLMs only learn logic in a single direction, e.g. if none of its training data had “elephant not in a room”, it wouldn’t know what to do and instead pattern match to “elephant in room”