The size of the Docker image could quickly increase during the build time. I became more mindful of the image size when I started to deploy on Github Actions. The bigger the image size, the longer the run time and the higher the runtime cost.
This is when you should consider using a multi-stage build ๐.
(1/2) MLflow for Machine Learning Development ๐
The MLflow for Machine Learning Development course by Manuel Gil provides a great introduction to the MLflow Python library ๐. The course focuses on the MLflow core functionality and workflow and covers the following topics:
โ Setting MLflow
โ Creating and working with experiences
โ Logging metadata (parameters, score, etc.)
โ Model registry
โ Model tuning
โ MLflow project demo
(1/3) I created a step-by-step tutorial for launching and customizing the RStudio server in a container using the Rocker RStudio image ๐ณ and the run command ๐ ๐๐ผ
Setting and running RStudio inside a containerized environment is easier than it seems, thanks to the Rocker project. This tutorial mainly focuses on the docker run command.
(3/3) This is the first tutorial out of a sequence. The next ones are going to cover:
โก๏ธ Formalizing the run command with Docker Compose
โก๏ธ Customizing the Rocker image with additional requirements
โก๏ธ Create a template
โก๏ธ Mount databases (e.g., Postgres, etc.)
(1/4) Setting A Dockerized Python Environment โ The Hard Way
I create a (relatively) short tutorial about setting up a dockerized ๐ณ Python ๐ environment on the command line (CLI). Generally, I don't advocate anyone to set their Python development workflow via the CLI. There are better tools to work with Python and Docker, such as VScode with the Dev Containers extension. ๐งต๐๐ผ
(2/4) Rather, there is a great learning experience here of the core Docker commands and functionalities. The goal here is to see how you can take an official Python image and customize it to your needs.
(3/4) This includes the following:
โ Running Python base image in interactive mode on the CLI
โ Create a Dockerfile and add new components to the base image
โ Install Python libraries and tools such as CLI text editor
โ Expose a bash terminal and enable the editing of Python scripts
โ Use volumes to transfer the container from an ephemeral mode to a persistent
My thinking here is that @huggingface is an acquisition target for #NVIDIA because they don't have an #MLOps platform offering - I also wonder where #GitLab sits in all this too ...
TIL https://www.jailbreakchat.com/ is a website that collects prompt injection attacks for LLMs, i.e. getting the language model to do stuff that is not allowed by inserting malicious prompts.
๐งโ๐ป New video! Walk through the "whole game" of #MLOps with #rstats:
๐ Data prep with #tidyverse
๐ง Model training & eval with #tidymodels
โ Deployment with #vetiver in #Docker ๐ณ on @huggingface ๐ค
๐ Monitoring with #pins
RAG From Scratch - Langchain Tutorial ๐ฆ๐๐ผ
The RAG From Scratch is a crash course by Lance Martin from LangChain focusing on the foundations of Retrieval Augmented Generation (RAG). This tutorial covers the process of index, retrieval, and generation of a query from scratch ๐.
I spent my Sunday morning reading the Distributed Machine Learning Patterns by Yuan Tang. The book, as its name implies, focuses on machine learning at scale using tools such as Tensorflow, Kubernetes, Argo, etc. That includes the following topics:
โ Handling large dataset
โ Approaches for training ML models with distributed machines
โ ML workflow and operation design
โ Building and deploying ML pipelines
If you want to get started with Google's Gemini API, here is an introductory course by Ania Kubow. This 1-hour course covers the following topics:
Getting started with API (setting access, API key, etc.)
โ Tokenization
โ Review Gemini models
โ Create a chatbot using the API
Managing the Complete Machine Learning Lifecycle with MLflow is a three-hour intro to MLflow workshop by Jules S. Damji. The course is for beginners, and it covers the core functionality of MLflow:
โ Tracking
โ Projects
โ Models
โ Registry
โ UI
FreeCodeCamp released today a new course on building RAG from scratch with LangChain. The course, which is by Lance Martin from LangChain, focuses on the foundations of Retrieval Augmented Generation (RAG).
Production Monitoring & Automations of LLM with LangSmith ๐ฆ๐๐ผ
LangChain released a crash course for LangSmith, their DevOps platform for deploying LLM applications into production. The course covers topics such as:
โ LLM applications monitoring
โ Setting automation
โ Performance monitoring
Getting started with the Dev Containers extension ๐๐๐ผ
The Dev Containers extension is the main reason I moved to VScode, as it provides a native and seamless integration of Docker ๐ณ. I started to work on a sequence of tutorials focusing on the VScode Dev Containers extension. The first tutorial on the sequence focuses on getting started with the Dev Containers extension;
(1/2) I recently posted a few posts about Rust ๐ฆ and my intention to leverage it for data science applications. Multiple people asked if Rust is a substitute for R or Python, and the short answer (in my opinion) is no. I see Rust as a complementary or supporting language that could make languages like R and Python faster.
Polaris ๐ปโโ๏ธ is one example of a Python ๐ application that uses Rust on the backend. ๐งต๐๐ผ