The number of bytes per image doesn’t necessarily mean there’s no copying of the original data. There are examples of some images being “compressed” (lossily) by Stable Diffusion; in that case the images were specifically sought out, but I think it does show that overfitting is an issue, even if the model is small enough to ensure it doesn’t overfit for every image.