@ramblingsteve Airflow is used to orchestrate the pipeline (data automation, model refresh), and MLflow is used for model experiments and tracking model performance.
(1/2) I recently posted a few posts about Rust ๐ฆ and my intention to leverage it for data science applications. Multiple people asked if Rust is a substitute for R or Python, and the short answer (in my opinion) is no. I see Rust as a complementary or supporting language that could make languages like R and Python faster.
Polaris ๐ปโโ๏ธ is one example of a Python ๐ application that uses Rust on the backend. ๐งต๐๐ผ
In the past few months, I created a bunch of Docker ๐ณ tutorials covering random topics, from a fun setting for a Python ๐ environment on the CLI to advanced topics such as multi-stage builds ๐๏ธ. I organized all the tutorials under one folder, and I plan to keep updating this folder with future-related ones ๐.
Currently on my Docker tutorial TODO list:
โก๏ธ Docker ENTRYPOINT vs CMD
โก๏ธ Docker multi-architecture build
Getting started with the Dev Containers extension ๐๐๐ผ
The Dev Containers extension is the main reason I moved to VScode, as it provides a native and seamless integration of Docker ๐ณ. I started to work on a sequence of tutorials focusing on the VScode Dev Containers extension. The first tutorial on the sequence focuses on getting started with the Dev Containers extension;
The size of the Docker image could quickly increase during the build time. I became more mindful of the image size when I started to deploy on Github Actions. The bigger the image size, the longer the run time and the higher the runtime cost.
This is when you should consider using a multi-stage build ๐.
(3/4) ๐๐ก๐๐ง ๐ฌ๐ก๐จ๐ฎ๐ฅ๐ ๐ฒ๐จ๐ฎ ๐ฎ๐ฌ๐ ๐ ๐ฆ๐ฎ๐ฅ๐ญ๐ข-๐ฌ๐ญ๐๐ ๐ ๐๐ฎ๐ข๐ฅ๐?
You should consider moving your build to a multi-stage build when the build-required dependencies are no longer needed after the build is completed. A classic example is when building a binary application. Also, this is effective when setting up a dockerized Python environment using a virtual environment.
FreeCodeCamp released today a new course on building RAG from scratch with LangChain. The course, which is by Lance Martin from LangChain, focuses on the foundations of Retrieval Augmented Generation (RAG).
Production Monitoring & Automations of LLM with LangSmith ๐ฆ๐๐ผ
LangChain released a crash course for LangSmith, their DevOps platform for deploying LLM applications into production. The course covers topics such as:
โ LLM applications monitoring
โ Setting automation
โ Performance monitoring
(1/2) Setting A Dockerized ๐ณ Python ๐ Environment โ The Elegant Way
A few weeks ago, I created a short tutorial about setting up a dockerized ๐ณ Python ๐ environment via the CLI, or the hard way. The second tutorial on this topic provides a more elegant and robust approach for setting up a Python dockerized development environment with VScode and the Dev Containers extension ๐.
(2/2) The Dev Containers extension is the main reason that I started to use VScode. It provides a native integration of Docker and makes the work with containers seamless. The tutorial focuses on setting up and customizing the Python environment with the devcontainers.json and Dockerfile files.
(1/2) I created the second tutorial on the series of running RStudio inside a container ๐. This tutorial focuses on formalizing the run command from the first tutorial with Docker Compose using the Rocker RStudio image ๐ณ ๐๐ผ
Setting and running RStudio inside a containerized environment is easier than it seems, thanks to the Rocker project.
(2/2) This is the second tutorial out of a sequence:
โ Launching an RStudio server inside a container with the docker run command
โ Formalizing the run command with Docker Compose
โก๏ธ Customizing the Rocker image with additional requirements
โก๏ธ Create a template
โก๏ธ Mount databases (e.g., Postgres, etc.)
RAG From Scratch - Langchain Tutorial ๐ฆ๐๐ผ
The RAG From Scratch is a crash course by Lance Martin from LangChain focusing on the foundations of Retrieval Augmented Generation (RAG). This tutorial covers the process of index, retrieval, and generation of a query from scratch ๐.
(1/2) MLflow for Machine Learning Development ๐
The MLflow for Machine Learning Development course by Manuel Gil provides a great introduction to the MLflow Python library ๐. The course focuses on the MLflow core functionality and workflow and covers the following topics:
โ Setting MLflow
โ Creating and working with experiences
โ Logging metadata (parameters, score, etc.)
โ Model registry
โ Model tuning
โ MLflow project demo
I spent my Sunday morning reading the Distributed Machine Learning Patterns by Yuan Tang. The book, as its name implies, focuses on machine learning at scale using tools such as Tensorflow, Kubernetes, Argo, etc. That includes the following topics:
โ Handling large dataset
โ Approaches for training ML models with distributed machines
โ ML workflow and operation design
โ Building and deploying ML pipelines
If you want to get started with Google's Gemini API, here is an introductory course by Ania Kubow. This 1-hour course covers the following topics:
Getting started with API (setting access, API key, etc.)
โ Tokenization
โ Review Gemini models
โ Create a chatbot using the API