#Transformer - kbin.social

br00t4c, 20 days ago to random

Midday Palate Cleanser

#transformer

https://mockpaperscissors.com/2024/05/07/midday-palate-cleanser-2135/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

albertcardona, 1 month ago to machinelearning

The #arXiv as the public record and version control system of a scientific manuscript: 7 versions spanning 6 years.

"[Submitted on 12 Jun 2017 (v1), last revised 2 Aug 2023 (this version, v7)]"

"Attention Is All You Need" by Vaswani et al.
https://arxiv.org/abs/1706.03762

That's the paper that introduced the #Transformer architecture, dispensing with recurrence and convolutions to achieve much faster training times and higher performance in a language task.

#MachineLearning #ScientificPublishing #LLM

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Lobrien, 3 months ago to ML

Any good sources on what the outputs of the attention blocks in a transformer represent? I expected that for "The bank of the plane took it around the savings bank on the bank of the river", the vectors corresponding to "bank" would diverge -- "rotation things/money things/rivery things" -- but AFAICT that doesn't clearly happen. Here are the dot prods of the normalized vectors (aka "cosine similarity") against themselves after embedding layer and attention block 5: #ML #Transformer

Heatmap showing identical vectors for identical word embeddings

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kellogh

itnewsbot, 3 months ago to random

Parts We Miss: The Mains Transformer - About two decades ago there was a quiet revolution in electronics which went unnot... - https://hackaday.com/2024/02/14/parts-we-miss-the-mains-transformer/ #mainstransformer #originalart #powersupply #transformer #featured #interest #parts

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bornach, 4 months ago to llm

[AI Coffee Break with Letitia] explains the #transformer architecture behind #LargeLanguageModels
https://youtu.be/ec9IQMiJBhs
#LLM #NeuralNet #ArtificialIntelligence

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

fabrice13, 4 months ago to ArtificialIntelligence Italian

On #biological vs #artificialintelligence and #neuralnetworks
Just skimmed through "Inferring neural activity before plasticity as a foundation for learning beyond backpropagation" by Yuhang Song et al. https://www.nature.com/articles/s41593-023-01514-1

Quite interesting but confusing, as I come from #backpropagation DL.
If I got it right, the authors focus on showing how and why biological neural networks would benefit from being Energy Based Models for Predictive Coding, instead of Feedforward Networks employing backpropagation.
I struggled to reach where they explain how to optimize a ConvNet in PyTorch as an EB model, but they do: there is an algorithm and formulae, but I'm curious about how long and stable training is, and whether all that generalizes to typical computer vision architectures (ResNets, MobileNets, ViTs, ...).
Code is also #opensource at https://github.com/YuhangSong/Prospective-Configuration

I would like to sit a few hours at my laptop and try to better see and understand, but I think in the next days I will go to Modern #HopfieldNetworks. These too are EB and there's an energy function that is optimised by the #transformer 's dot product attention.
I think I got what attention does in Transformers, so I'm quite curious to get in what sense it's equivalent to consolidating/retrieving patterns in a Dense Associative Memory. In general, I think we're treating memory wrong with our deep neural networks. I see most of them as sensory processing, shortcut to "reasoning" without short or long term memory surrogates, but I could see how some current features may serve similar purposes...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ elduvelle

j_bertolotti, 5 months ago to random

Something that having a kid taught me is that #transformer nowadays are significantly worse toys than those I played with ~40 years ago.
#OldAndGrumpy

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ franco_vazza

[R] Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers (www.reddit.com)

[R] VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation (www.reddit.com)

Arxiv...

trixter, 5 months ago to Transformers

Wild to see people on a friend's Facebook post throwing an absolute FIT over Hasbro "reissuing" HasLab stuff today when it's pretty clear from how fast they sold out that they were just clearing out a few spares. Unicron sold out in less than a minute. Victory Saber was gone in two. I'd be surprised if they had more than a dozen of either of them. But no, Hasbro LIED and REISSUED them. 😑

#transformers #HasLab #Hasbro #ToyCollecting #transformer #unicron #VictorySaber

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Binder

SinclairSpeccy, 5 months ago to random

Thinking about how great the #Transformer wiki tfwiki.net is with how they write their articles.

Take Starscream for example and some others 😂

Takara! Tomy! Superlink!
MY CROTCH! MY EVIL CROTCH!!!
"No! Please! I was only in the movie for 42 seconds!"

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

SinclairSpeccy, 6 months ago to random

I wish there were more decent #Transformer games for the PC. War for Cybertron and Fall of Cybertron get boring after a while

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

br00t4c, 6 months ago to random

Fulcrum Point presents the midwest premiere of La Monte Young's The Second Dream of the High-Tension Line Stepdown Transformer

#post #transformer

https://chicagoreader.com/music/fulcrum-point-la-monte-young/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

itnewsbot, 6 months ago to random

Flipped Transformer Powers Budget-Friendly Vacuum Tube Amp - If you’ve ever wondered why something like a radio or a TV could command a hefty f... - https://hackaday.com/2023/11/07/flipped-transformer-powers-budget-friendly-vacuum-tube-amp/ #classichacks #transformer #vacuumtube #amplifier #isolation #secondary #filament #primary #parts #power #valve

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

davidaugust, 6 months ago to me

A passageway, under the tracks and beyond the technology. It awaits us all.

📷 by #me

#sidewalk #wires #trees #sky #transformer #bollard #tracks #rail #underpass #city #architecture #photography

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ KrissyKat

bortzmeyer, 7 months ago to llm French

A good article about how #LLM and the #transformer model work https://ig.ft.com/generative-ai/

#AI

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kenkousen

50years_music, 8 months ago to transgender

"Walk on the Wild Side" is a song by American #rock musician #LouReed from his second solo studio album, #Transformer (1972). It was produced by #DavidBowie and #MickRonson and released as a #doubleAside with "#PerfectDay". Known as a counterculture anthem, the song received wide radio coverage and became Reed's biggest hit and #signatureSong while touching on topics considered taboo at the time, such as #transgender people, #drugs, #maleProstitution, and #oralSex.
https://youtu.be/oG6fayQBm9w

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ msquebanh

br00t4c, 8 months ago to random

This 'Transformer Table' Broke the Internet

#transformer

https://www.vice.com/en/article/4axqpb/viral-transformer-table

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nothingistrue, 8 months ago to vinyl

Transformer - Lou Reed

#vinyl #LouReed #transformer #music #PerfectDay #sunday @vinylrecords

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tero, 9 months ago to LLMs

More efficient inference for #LLMs:
#RecycleGPT: An Autoregressive Language Model with Recyclable Module

It trains a small student #RNN which takes the whole #Transformer decoder hidden state and its output token embedding as input, and produces the next hidden state (which can be mapped and sampled to produce the next output token).

It is not trained as an RNN, which would be inefficient because of the token-wise sequential dependencies, but in training time it can depend on the previous hidden states produced by the transformer in parallel, so the RNN can be trained efficiently in parallel.

It is interlaced in inference so that the small student network can produce every other output token efficiently, without significant quality degradation.

Improvement suggestions from me:

This might benefit from adding routing which can decide whether to use the student model or the full model at every token based on another small model which predicts the quality degradation.

The small model doesn't need to be small either, it can still be more efficient in inference than the transformer is, but it can be large enough to be competitive in quality without suffering from quadratic complexity over the sequence length.

https://arxiv.org/abs/2308.03421

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kellogh

Tae156, 10 months ago to random

So #Namjoon is actually training himself to be a tree? #WildFlower #Tree #Transformer #WellMuscled

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

itnewsbot, 10 months ago to hardware

AC-DC Converter is Reliable, Safe, and Efficient - When first starting an electronics project, it’s not uncommon to dive right in to ... - https://hackaday.com/2023/07/12/ac-dc-converter-is-reliable-safe-and-efficient/ #switched-modepowersupply #flybackconverter #switchedmode #powersupply #transformer #hardware #flyback #smps

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hperrin, 11 months ago to ai

How come #transformer models aren't made to go back and change their answer as they work? If you ask a human to write something, they will very rarely just spit out an entire document word for word and be done. Most human work involves revising your own output as you work. If you prompt an #LLM to do this, you will get a better result, so why not build the model to do this from the get go?

(I revised this post 4 times before posting it.)

#AI #ChatGPT #LargeLanguageModels

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

itnewsbot, 1 year ago to random

Don’t Let The Baluns Float Over Your Head - Most ham radio operators will build an antenna of some sort when they first start ... - https://hackaday.com/2023/04/29/dont-let-the-baluns-float-over-your-head/ #radiofrequency #highfrequency #transformer #radiohacks #commonmode #uglybalun #feedline #antenna #balun #choke #noise #coax #rf

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tero, 1 year ago to random

People working in #LLM #Transformer models are talking about making models deep and/or long.

The depth of the Transformer model tends to induce more capabilities while increasing the inference time of the model only linearly. The main requirement for scaling that is the amount of good quality training data available.

Making the models longer in time, that is, increasing the context length, makes the models more useful in situations which require more context length. Currently GPT-4 supports a maximum context length of 32k tokens which is more than enough for many valuable use cases. I have so far gotten by with GPT-3.5 context length of 4,096 tokens with some clever optimization methods.

Some use cases such as maintaining huge existing codebases would benefit from even larger context lengths, and larger context lengths would also allow companies to use in-context learning instead of fine-tuning to make the model customized to specific use.

We can also make systems which selectively put only currently needed stuff into the context.

Scaling up the context length is more difficult than making the models deeper because of Transformer self-attention layers. Each output token is produced by computation which scans through all the previous input tokens so far, which makes scaling quadratic.

There are methods to mitigate this scaling difficulty, using for example attention-free Transformers, RNN models, memory tokens, state space models like Hungry Hungry Hippos and Hyena model utilizing fourier transformations and convolutions in clever ways to increase the receptive fields of convolutions.

It seems like much of the self-attention layer computational capacity is actually wasted at least in inference (see Hyena paper), so there is much algorithmic room for improvement in principle. However, it is a common theme in deep neural networks that an apparent wasted capacity is actually needed to train effectively even if it isn't needed in the final inference.

It is still an open question how much more efficient we can make large Transformer-type models, but the work has barely even started.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ HistoPol