Comments

KingsmanVince, to machinelearning in PaLI-3 Vision Language Models: Smaller, Faster, Stronger
KingsmanVince avatar

indeed it would be great if the authors did so. I personally found some non-official implementations:

KingsmanVince, to machinelearning in PaLI-3 Vision Language Models: Smaller, Faster, Stronger
KingsmanVince avatar
KingsmanVince, to machinelearning in Think before you speak: Training Language Models With Pause Tokens
KingsmanVince avatar

IIRC DeTr generate a sequence to predict boxes of objects. I think this paradigm can be applied to such models. "Think before you locate" could be a new path to explore.

KingsmanVince, to machinelearning in Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training
KingsmanVince avatar
KingsmanVince, to machinelearning in MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
KingsmanVince avatar
KingsmanVince, to machinelearning in Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
KingsmanVince avatar
KingsmanVince, to machinelearning in Retentive Network: A Successor to Transformer for Large Language Models
KingsmanVince avatar

https://github.com/Jamie-Stirling/RetNet non-official implementation

KingsmanVince, (edited ) to machinelearning in NeurIPS 2023 Machine Unlearning Challenge
KingsmanVince avatar
KingsmanVince, to machinelearning in GitHub - mazzzystar/Queryable: Run CLIP on iPhone to Search Photos.
KingsmanVince avatar
KingsmanVince, to machinelearning in Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing
KingsmanVince avatar

I will follow then.

KingsmanVince, to machinelearning in Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
KingsmanVince avatar
KingsmanVince, to machinelearning in Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing
KingsmanVince avatar

I know we are moving away from Reddit. However, if I don't link, I feel like we may miss out good threads on r/machinelearning. Moreover, the authors don't only post arxiv links, they post other sutff such as Summary, Key points, ... (e.g this).

So can I at least put them in the posts instead of posting in a comment?

KingsmanVince, to machinelearning in Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing
KingsmanVince avatar
KingsmanVince, to machinelearning in VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
KingsmanVince avatar

The idea is similar to BLIP-2. Both papers use learnable tokens as queries for a transformer decoder. This decoder query from vision space base on the trainable queries and prompt.

KingsmanVince, (edited ) to machinelearning in Machine Learning Beginner Info/Resources
KingsmanVince avatar

I also want to share some resources.
For Pytorch,

For TPU,

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • kavyap
  • DreamBathrooms
  • khanakhh
  • GTA5RPClips
  • osvaldo12
  • magazineikmin
  • mdbf
  • InstantRegret
  • rosin
  • Youngstown
  • slotface
  • everett
  • Durango
  • JUstTest
  • ngwrru68w68
  • modclub
  • tester
  • tacticalgear
  • cubers
  • thenastyranch
  • cisconetworking
  • ethstaker
  • Leos
  • provamag3
  • normalnudes
  • anitta
  • lostlight
  • All magazines