urusan

@urusan@fosstodon.org

Java developer by day, Julia developer by night.

Amateur philosopher

Sometimes funny...

Working Dad

Controversial things about me:
Everyone: transhumanist, into AI (art)
Right-wing: polyamorous (married), agnostic atheist, leftist, working class consciousness
Leftist: corporate drone by day, loyal citizen of the US (but a serious reformer), former libertarian

I hope you can look past all that though, we people need to stick together

Lives with: Wife, T (son), and A (daughter).

This profile is from a federated server and may be incomplete. Browse more on the original instance.

urusan, 7 months ago to random

If you're a fairly ordinary user of stable diffusion but are interested in doing some training, you can set aside most of the advice you'll hear about model training, because a lot of it is intended for people heavily involved in model development and concerns:

preventing concept bleeding so you can use your model more flexibly

providing a nice out of the box experience

By ignoring those concerns you can get good results with extremely little work, time, and compute power

reply

expand (22)

collapse (22)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

I did a bunch of experiments on small training sets to get a better idea of what works well, and here's some key things I found:

The chosen class matters a lot because it determines what the AI will pick out of your images to learn and what it "knows about" already

Choosing your training model is also important because each one imparts a different style on the end result, but you won't be stuck with it

You don't need many training images, 1 is fine for well known classes

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

Both captions and regularization images are regularization mechanisms. Their main effect is to make your generated images more like the baseline and less like your training data, but in specific ways. (Note that I haven't experimented too heavily with regularization images at the moment, just captions.)

Because captions regularize your training, it also slows down the learning process. Thus you'll need more steps and more epochs to get good results with extensive captions.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

Oh, I should have mentioned that these experiments were done in kohya_ss on standard Loras on SD 1.5 and models from that lineage.

Loras are massively less expensive to train in terms of time and VRAM, so unless you have a crazy rig or are doing serious model development they seem like the right tool. At the very least they fit the kind of thing I'm describing really well.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

In any case, it turns out that the fastest way to get good results is to train your Lora using no captions or regularization images. This will produce a model that produces results similar to your training data in an extremely short time frame. I was seeing decent results after 7-8 epochs of 16 repeats and it was usually overbaked by 20 epochs (this process takes about a minute on my machine for 1 image).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

Technically, the system automatically adds in a caption of "instance, class" to all your images. You'll want to generally keep that if you decide to put in additional specific captions.

I also played around with captioning with:
"instance, class"
"instance"
"class"
""
and the only one that was substantially different was just "class" alone. I haven't done enough to understand it fully, but it seemed to tie the result to the class more tightly and produce more variety.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

Ok, so this is all very cheap and easy. You just throw some images into a folder and train on it. Why isn't this more common?

The issue is that if you use your Lora with weight 1.0 and quality/negative prompts, then you'll usually get a artifact-y, deformed mess.

That said, you just need to adjust the weight and apply the tools in the normal AI art toolbox to fix these issues, and you'll have good results.

If you want really good results, then you should also do Hires fix.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

I found that for the single image training sets, going to a weight of 0.6 on a good model with good prompting was usually enough to get decent results and with Hires fix and some cherry picking you could get really great results

In particular, at 0.6 the base model is usually starting to have substantial influence again, so it's going to generate something inspired by whatever you trained rather than replicating the original

You can adjust the weight depending on what you want

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

I'm currently training on some larger datasets without any captioning and regularization and the initial results are even better than I expected. It's just taking a long time to crank through all the data.

It really is as easy as throwing a bunch of images of the desired instance into a folder and training on them.

The effort required to properly caption a large dataset and the extra training time/compute required are prohibitive for a non-obsessed individual. This avoids that

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

I've had good luck chaining my Loras together with other, better made Loras (and of course, models).

I've also got two of my models to work together successfully, but the heavy concept bleeding means they probably aren't quite as composable as a well designed one. It's just fine for making the one thing though.

There's also many other higher level AI Art tools to fix specific problems, like using ControlNet or prompt segmentation to control how things are laid out.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

One of the most important caveats here though is "well known class".

This technique is basically leveraging the strength of the base model to get good results.

I made an example with a F-14 fighter jet and SD 1.5 is so bad at drawing airplanes that the results were a complete mess. It was what you might have expected this approach to yield, with all the weird little details of the original image blown up to a million and everything being deformed and AI artifact-y.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

Examples of the messed up F-14 and the original training image.

Keep in mind this is an example of a failure mode (also, as an early example I was doing the full captioning, and I also used the base SD 1.5 model for this one).

I'll post an example of a success in a minute.

A fighter jet flying at high altitude. It's a real mess of artifacts, though not as bad as the others since the strength isn't as high.
A conjoined fighter jet flying at high altitude. It's a real mess of artifacts.
A real photo of a F-14 flying at high altitude over some mountains and clouds. This is the original image.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

I trained a model on my VW ID.4 as an example.

First of all, here's what the model I was using (Absolute Reality) does with the same prompt (including "VW" and "car" in it) without the Lora

The interior of a VW car. There are inexplicably 2 steering wheels.
A brown VW bug in a parking lot in front of a manicured lawn and garden sculptures.
A dark green VW car of unknown model driving on a highway with the treeline in the distance.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

Here it is at 100% weight. It looks a lot like the training images.

A white ID.4 in a featureless space.
A yellow ID.4 in a featureless space, from the rear.
A yellow ID.4 in a featureless space.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago to random

What's the base 12 equivalent of percent?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

urusan, 7 months ago to random

In the book The Mythical Man Month, there's a section about Brooks's ideal team structure, which he calls the surgical team.

What was memorable to me was that it was so much a product of its time. As written, it was very clearly meant for programming a 1970's mainframe. There were 13 workers on the team, nearly half of which were secretaries.

A modern team fulfilling the same role is between 1-3 workers, with all the other roles replaced by software, such as version control and compilers.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

The remaining roles today are the surgeon (development lead), the co-pilot (other developers), and the administrator (which doesn't need to be dedicated).

Depending on the scale of the project the entire team can be implemented by one developer nowadays, assuming proper support systems are in place

Larger projects can use a co-pilot or a few more developers in a similar role

Especially large projects can be split up further, but that was true back in the day as well and is a different problem

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 7 months ago

I think the wisdom of this old model that is still applicable today is:

Developers need the right support systems to be truly effective

The dynamic between the lead developer and co-pilot is extremely valuable (on a larger project).

Number 2 is the basis for pair programming, since every pair is a modern surgical team.

Personally I think it's more valuable for larger projects which would benefit from 2+ developers anyway. Design and code review can have a similar effect if done right.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ reiver

urusan, 8 months ago to random

How different from telnet is the modern web anyhow?

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

urusan, 8 months ago to random

For anyone wondering about my odd posting pattern today, I was at the DMV this morning and as soon as I was called I went from having all the time in the world to being super busy.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

urusan, 8 months ago to random

Physics!
https://youtu.be/AL2Chc6p_Kk

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ elkarrde, reiver

strypey, 8 months ago to random

I'm writing a blog piece about the Digital Markets Act in the EU, and legislation of the same name in the UK. As well as others laws in play or on the horizon to regulate digital services.

So what does the fediverse think of the #DigitalMarketsAct?

#AskFedi

reply

expand (66)

collapse (66)

report

activity

copy /kbin url

copy original url

open original url

Loading...

urusan, 7 months ago

@strypey @icedquinn @deflarerOfClouds Interesting side note: while reading up on the Berne Convention and the UCC (https://en.m.wikipedia.org/wiki/Universal_Copyright_Convention), it was mentioned in the UCC article that they were worried that nations would renounce the Berne Convention and join the UCC, so they made a special rule in the UCC to penalize that.

However, this makes it much clearer that we aren't stuck in these treaties. The benefit to staying is international enforcement.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

urusan, 7 months ago

@strypey @icedquinn @deflarerOfClouds Total elimination of copyright is especially easy. Just renounce the relevant treaties at the cost of immediately losing copyright internationally (and possibly pissing off other countries, which might not cooperate in other areas in retribution).

Reform is harder but it's just a matter of putting a new treaty in place before renouncing the old one. The new treaty doesn't have to mention the old treaty at all.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

urusan, 8 months ago to random

What do you wish you had more time to pursue?

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

urusan, 8 months ago to random

As a former libertarian and a developer, I was very much the target audience for cryptocurrency for many years. I actually started following it when I was in college in 2009.

I never actually got involved in it besides reading, thinking, and talking about it though, which in retrospect I'm glad for...not for money reasons (considering the timeline, I probably would have made a bunch of money), but because I would be complicit in a bunch of stuff I now strongly disagree with.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

urusan, 8 months ago

There are some interesting technical ideas in the cryptocurrency/blockchain space for sure, but at the end of the day you really have to accept the ideology to justify why it's all worth it.

There are huge downsides apparent to all:

Huge negative environmental impact

Ubiquitous scams

Wealth concentration

Artificial scarcity

Etc.

It's also telling that the proposed solutions to issues were generally not followed through on, at least while there was serious money at stake

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...