Why use a venv?
A virtual environment isolates Python packages for a specific project.
It’s like a self-contained sandbox so one project’s libraries (say pandas==2.2) don’t break another that needs pandas==1.5.
You’ll use it for:
- Data pipelines (Airflow, Pandas, NumPy)
- ETL scripts
- Flask / API projects (TransferDepot, etc.)
- Anything containerized or air-gapped
⚙️ 1. Create a venv
# Create a folder for your project
mkdir ~/projects/data_playday && cd ~/projects/data_playday
# Create a virtual environment inside it
python3 -m venv venv
This creates a directory venv/ containing its own Python binary + site-packages.
🚀 2. Activate it
source venv/bin/activate
Now your shell prompt will usually show (venv) — meaning you’re “inside” it.
Everything you pip install from now on stays local to this environment.
Check it:
which python
which pip
→ should point to ~/projects/data_playday/venv/bin/python
📦 3. Install packages (safe + isolated)
pip install pandas sqlalchemy jinja2If you’re in an air-gapped setup:
-
Download wheels elsewhere:
pip download pandas -d /mnt/repo/packages -
Then install offline:
pip install --no-index --find-links=/mnt/repo/packages pandas
📜 4. Save dependencies
When your project works, freeze the package list:
pip freeze > requirements.txtLater you (or someone else) can recreate the exact setup with:
pip install -r requirements.txt🧹 5. Deactivate and switch between envs
deactivate # exits current venvThen, for a new project:
cd ~/projects/new_data_thing
python3 -m venv venv
source venv/bin/activate🧠 6. (Optional) Use a venv manager
For multiple data projects, use one of:
-
direnv→ auto-activates your venv when you cd into a directory -
virtualenvwrapper→ keeps all venvs under~/.virtualenvs - Poetry / Pipenv → handle dependencies + venvs together
❤️ Quick mental model
You want to...
You do...
Keep a project clean
python -m venv venv
Work inside it
source venv/bin/activate
Add packages
pip install pandas
Save them
pip freeze > requirements.txt
Rebuild later
pip install -r requirements.txt