@urusan@fosstodon.org avatar

urusan

@urusan@fosstodon.org

Java developer by day, Julia developer by night.

Amateur philosopher

Sometimes funny...

Working Dad

Controversial things about me:
Everyone: transhumanist, into AI (art)
Right-wing: polyamorous (married), agnostic atheist, leftist, working class consciousness
Leftist: corporate drone by day, loyal citizen of the US (but a serious reformer), former libertarian

I hope you can look past all that though, we people need to stick together

Lives with: Wife, T (son), and A (daughter).

This profile is from a federated server and may be incomplete. Browse more on the original instance.

urusan, to random
@urusan@fosstodon.org avatar

Me: You know what is for dinner tonight? Chicken noodle soup! It's so yummy!
Kid: I don't want that! I won't taste it!

urusan, to random
@urusan@fosstodon.org avatar

Kid: Jumps in and tries to click on an ad in a mobile game my wife is playing
Wife: Stops him This is why you aren't allowed to play this game yet. You click on ads.
Kid: What's an "ad"?
Me: It's an active attempt by someone to attack your mind.

urusan, to random
@urusan@fosstodon.org avatar

If we met an alien species, would any of our units of measurement be the same?

urusan, to random
@urusan@fosstodon.org avatar

Me: I wonder if [Kid] will remember when we denied him that airplane toy. He's still remembering it days later. I still remember when my mom refused to buy me Zone 66 on clearance for $5 and I never found it available again.
Me: Man, that was such a great game. I'd love to play it again. I played the **** out of the shareware version.
Me: ...
Me: Hmmm...wait...it is totally within my capabilities now to create a new game just like Zone 66...

Also,
https://en.m.wikipedia.org/wiki/Zone_66

urusan, to random
@urusan@fosstodon.org avatar

I read a really great book a while back: The Victorian Internet. It covers the history of the telegraph up through the time that the telephone started to seriously displace it.

The transition from telegraph to telephone is interesting, but it basically grew out of seemingly unrelated developments in multiplex telegraphy.

AI is to computers as the telephone is to the telegraph.

urusan, to random
@urusan@fosstodon.org avatar

Is it possible for something to "go viral" on Fedi?

urusan, to random
@urusan@fosstodon.org avatar

I watched 2 videos that gave me a shift in perspective today (specifically in mathematics).
Vector multiplication: https://youtu.be/htYh-Tq7ZBI
A different way to view the Reimann Hypothesis:
https://youtu.be/dwe4-OiRw7M

urusan, to random
@urusan@fosstodon.org avatar

I just read the Dall-E 3 paper and the TL;DR is that humans are not good at creating good training data.
https://cdn.openai.com/papers/dall-e-3.pdf

They start over and use a small amount of extra high quality human generated data to bootstrap an image captioner that produces the training data they want. Then they trained DALL-E 3 from that and it made a huge difference.
https://openai.com/dall-e-3

I'm sure open models will follow suit soon, this isn't a difficult improvement to implement.

urusan, to random
@urusan@fosstodon.org avatar

Realistic detail left out of pretty much all post-apocalyptic stories where the survivors hid out in underground bunkers for at least one generation:
Everyone wearing glasses!

urusan, to random
@urusan@fosstodon.org avatar

I decided to open up a dedicated AI art account over at aipub.social https://aipub.social/@urusan

In the future I'll post anything I generate or do discussion of technical aspects like AI training or art technique over there. So if you want to see my generations or hear about technique, be sure to follow that account.

My fosstodon account will still be my main account, I'm not moving or anything like that.

Also, I will still talk about AI here, especially AI-related philosophy or musings.

urusan, to random
@urusan@fosstodon.org avatar

I've been playing around with training LoRas on 2-image datasets.

Ironically, they aren't as good as 1-image datasets. The second image causes the two to conflict with each other, which has the effect of causing substantial AI artifacts in certain parts of the generations. In my initial examples it's mostly the faces and hands, but it can appear elsewhere too.

These issues can be overcome, mostly on the user side, but it seems like it's generally more useful to have 1 or many.

urusan, to random
@urusan@fosstodon.org avatar

Wife: Instagram is holding my friends hostage.
Me: Yup, that's their whole business model.

urusan, to random
@urusan@fosstodon.org avatar

While techno-solutionism is a problem, don't write off the importance of technology either.

For instance, my mother is alive today due to a cancer drug that was approved just months before she started being treated with it.

At one point they told her to go home and get her affairs in order, but then her oncologist got the information they needed and changed her medication just in time.

Technology makes a real difference in people's lives.

urusan, to random
@urusan@fosstodon.org avatar

Saying that we're all "just working on documents" doesn't really capture the scope and importance of these documents.

urusan, to random
@urusan@fosstodon.org avatar

A new way to develop antidotes!
https://youtu.be/O4AUJfITYPk

Also, specifically, antidotes to death cap mushrooms and box jellyfish stings.

urusan, to random
@urusan@fosstodon.org avatar

Now that I understand captions better, I've been getting some amazing results with single-image dataset Loras.

The essential idea is that you can use captions and Lora training to isolate just what you want out of a specific image for use elsewhere.

However, the thing is that the result is FAR more flexible than you might expect. You can use something that's approximate.

For an example, I trained 2 Loras using these images:

A very simple black and white picture of a hexagon. It's in the same position as the ball. For training the instance was "hexagon" and the captions were: geometric shape, black and white, simple white background

urusan, to random
@urusan@fosstodon.org avatar

I was investigating where the tagging data for AI is coming from and it appears that it's already reached a sort of "human escape velocity".

Technologies like CLIP and its successors are good enough to do the job on a massive scale. This is what LAION did, which is what fed into Stable Diffusion.

I guess this also explains the reluctance to use more recent data to train AI, since dataset attacks would be harder to defend against if there's relatively little human intervention in the process.

urusan,
@urusan@fosstodon.org avatar

This also lines up with my own experiences with the automatic taggers. They work great, and at least for common things they're almost always accurate.

The misses don't cause too many issues especially when you go to a large scale. You don't need perfection to get good enough, you just need good enough to get good enough.

This can then be further leveraged to bootstrap to more accurate models. It's not a completely automated process, but the human work is a small component.

urusan, to random
@urusan@fosstodon.org avatar

Me: Hey there Pinebook Pro. It's been a long time. How are you doing?
Pinebook Pro: Pretty good, though I can't seem to find the Internet, I don't see the Wi-Fi I normally connect to.
Me: Oh, let me figure that out.
Pinebook Pro: By the way, it's August 19, 2022, right?

urusan, to random
@urusan@fosstodon.org avatar

It seems that Kohya SS's class parameter is ignored when you have captions.

I'm also pretty confident that classes essentially work like captions, though the math that implements it must be different as the results are ever so slightly different. (Subjectively, the class version gives better results but they're not different enough to write home about.)

urusan,
@urusan@fosstodon.org avatar

My current mental model is that a caption (or class) subtracts whatever concept it encodes from the Instance/Lora and learns the version of itself present in the training data.

Most of the remainder is invested in the instance token, with a little bleeding out across the board.

This explains why ignoring captions entirely works so well most of the time. Instead of splitting out the essential elements of the training data, it's putting it all into the instance token.

urusan,
@urusan@fosstodon.org avatar

One thing is for sure, if you use a caption (or class) and you want that thing at full strength, then you need to add it back in when you prompt.

A useful tool if you don't want any class is to just use the instance as the class. It's equivalent to not having one at all (which you can also do by putting a space in for the class).

Though that said, I have found this totally undirected learning to work pretty slowly.

urusan,
@urusan@fosstodon.org avatar

The single image Lora thing is working pretty well at this point. With correct strength and (negative) prompting, it gives good results with just one image.

Essentially, strengths between 0.3-0.6 work best, with anything over 0.5 needing negative prompts that smooth out the artifacts that arise, and up to 0.9 working.

Obviously it's super biased toward that one image and/or elements within it, but you can get it to cooperate through your choice of prompts.

urusan,
@urusan@fosstodon.org avatar

When you're making a larger Lora model, try to avoid captioning when you don't have enough training data to train for the caption (ex. if you have hundreds of examples overall but just one example of that caption).

If you've only got one image which requires a specific caption, then if you caption it you'll have inadvertently made a 1-image training set for that caption and prompting for that caption will be massively biased towards that one example.

ZachWeinersmith, to random
@ZachWeinersmith@mastodon.social avatar

Where do you draw the line between AI you like and dislike? My sense is there's a serious disdain for e.g. generative art or text, but not for other machine-learning stuff, like plant ID apps or speech to text software.

urusan,
@urusan@fosstodon.org avatar

@ZachWeinersmith I think the dividing line for most people is when it starts having serious implications for their daily life.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • tacticalgear
  • khanakhh
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • everett
  • ngwrru68w68
  • Durango
  • megavids
  • InstantRegret
  • cubers
  • GTA5RPClips
  • cisconetworking
  • ethstaker
  • osvaldo12
  • modclub
  • normalnudes
  • provamag3
  • tester
  • anitta
  • Leos
  • lostlight
  • All magazines