Comments - KingsmanVince

KingsmanVince, 7 months ago to machinelearning in PaLI-3 Vision Language Models: Smaller, Faster, Stronger

indeed it would be great if the authors did so. I personally found some non-official implementations:

https://github.com/kyegomez/PALI

https://github.com/ahmdtaha/distributed_sigmoid_loss

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 7 months ago to machinelearning in PaLI-3 Vision Language Models: Smaller, Faster, Stronger

SigLIP

PaLI

PaLI-X

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 8 months ago to machinelearning in Think before you speak: Training Language Models With Pause Tokens

IIRC DeTr generate a sequence to predict boxes of objects. I think this paradigm can be applied to such models. "Think before you locate" could be a new path to explore.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 9 months ago to machinelearning in Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training

https://github.com/FudanDISC/weakly-supervised-mVLP/tree/master

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 9 months ago to machinelearning in MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Related links:

https://github.com/lucidrains/MaMMUT-pytorch

https://ai.googleblog.com/2023/05/mammut-simple-vision-encoder-text.html

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 9 months ago to machinelearning in Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Github link

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 10 months ago to machinelearning in Retentive Network: A Successor to Transformer for Large Language Models

https://github.com/Jamie-Stirling/RetNet non-official implementation

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 10 months ago (edited 10 months ago) to machinelearning in NeurIPS 2023 Machine Unlearning Challenge

Relevant links:

https://github.com/unlearning-challenge/starting-kit

https://arxiv.org/abs/2209.02299

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 11 months ago to machinelearning in GitHub - mazzzystar/Queryable: Run CLIP on iPhone to Search Photos.

Relevant links

https://queryable.app/

https://apps.apple.com/us/app/queryable-find-photo-by-text/id1661598353?platform=iphone

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 11 months ago to machinelearning in Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

I will follow then.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 11 months ago to machinelearning in Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models

Related links:

https://luogen1996.github.io/lavin/

https://github.com/luogen1996/LaVIN

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 11 months ago to machinelearning in Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

I know we are moving away from Reddit. However, if I don't link, I feel like we may miss out good threads on r/machinelearning. Moreover, the authors don't only post arxiv links, they post other sutff such as Summary, Key points, ... (e.g this).

So can I at least put them in the posts instead of posting in a comment?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 11 months ago to machinelearning in Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Reddit thread: https://www.reddit.com/r/MachineLearning/comments/14pq5mq/r_hardwiring_vit_patch_selectivity_into_cnns/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 11 months ago to machinelearning in VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

The idea is similar to BLIP-2. Both papers use learnable tokens as queries for a transformer decoder. This decoder query from vision space base on the trainable queries and prompt.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 11 months ago (edited 11 months ago) to machinelearning in Machine Learning Beginner Info/Resources

I also want to share some resources.
For Pytorch,

https://pytorch.org/tutorials/ their basic tutorials are fundamental but some more advanced tutorials might be outdated.

https://www.learnpytorch.io/ the author guides mostly in computer vision but he gives the overview from research to production.

For TPU,

https://github.com/ayaka14732/tpu-starter full guideline using TPUs with Jax

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...