The essential toolkit | Domagoj K. Hackenberger

I embarked on my Python programming journey in 2010, during my freshman year. Over time the work has grown from simple scripts to a sprawling stack covering geospatial analysis, deep learning, LLM-powered agents, and production web APIs. With each passing year the toolkit gets refined — libraries that once felt exotic become daily drivers, and new ones earn their place only after proving their worth across multiple projects.

This post is a living record of what I actually reach for. Not a curated “top 10” list, but an honest survey of the packages that show up in my environment.yml files again and again.

Setup

I manage all Python environments with Miniforge — the community-maintained, conda-forge distribution of conda. It is lightweight, defaults to the vastly richer conda-forge channel, and lets me pin exact environments per project through environment.yml files that I commit alongside the code.

Python has three package managers worth knowing: pip, conda, and mamba.

	pip	conda	mamba
Library manager	✔️	✔️	✔️
Virtual environment manager	❌	✔️	✔️
Dependency conflict solver	❌	✔️	✔️
Official PyPI support	✔️	❌	❌
Scientific / binary packages	❌	✔️	✔️
Parallel solver	❌	❌	✔️

In practice: conda/mamba for environments and compiled dependencies (GDAL, PyTorch CUDA builds, HDF5), pip for anything that only lives on PyPI.

Install Miniforge on Linux:

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

Then install mamba for faster solves:

conda install -n base mamba -c conda-forge

Create and activate a project environment:

mamba create -n myenv python=3.11
conda activate myenv

Why environments matter

Every project in ~/dev lives in its own conda environment. A geospatial project might pin gdal=3.8, while a PyTorch project needs CUDA 12.x binaries — these cannot coexist in a single global install. Committing environment.yml also means any collaborator (or a future me) can reproduce the exact setup with one command:

mamba env create -f environment.yml

The toolkit

Core scientific stack

These appear in virtually every project.

NumPy is the foundation — n-dimensional arrays, broadcasting, and the data model that everything else builds on. You rarely call it alone, but nothing works without it.

SciPy adds the algorithms: integration, interpolation, signal processing, statistics, sparse matrices, and spatial data structures. I reach for scipy.stats, scipy.optimize, and scipy.spatial constantly.

Pandas handles tabular data. DataFrames are how field data, CSV exports, database query results, and API responses all end up in memory. The groupby / apply / merge pattern covers 80 % of data wrangling.

Matplotlib and Seaborn cover publication-quality static plots. Seaborn’s statistical plot types (boxplot, violinplot, pairplot) save a lot of boilerplate; Matplotlib gives the fine-grained control needed for figures that go into papers.

Tqdm wraps any iterable in a progress bar. Indispensable when a loop takes more than a few seconds.

mamba install numpy scipy pandas matplotlib seaborn tqdm

Geospatial stack

A large part of my work involves spatial data — remote sensing products, habitat maps, drone surveys, climate model output.

GeoPandas extends Pandas with a geometry column, making it possible to do spatial joins, reprojections, and area calculations using a DataFrame interface. It is the entry point for most vector data workflows.

Rasterio and GDAL handle raster data: reading GeoTIFF and NetCDF files, reprojecting, windowed reads for large datasets. GDAL is the underlying engine for most geospatial I/O across languages.

Shapely provides the geometric primitives (polygons, lines, points) and the operations on them (union, intersection, buffer, distance). GeoPandas uses it internally but I often need it directly.

xarray is indispensable for multidimensional scientific datasets — especially NetCDF climate model output and satellite products with (time, lat, lon) or (time, level, lat, lon) dimensions. It brings Pandas-like labelled indexing to arrays, and integrates well with cfgrib for ECMWF GRIB files.

Cartopy produces publication-quality maps with proper projections. For interactive web maps I reach for Folium or hand the GeoJSON off to a Leaflet frontend.

mamba install geopandas rasterio gdal shapely xarray cartopy folium

Computer vision and image processing

Several projects involve microscopy images, UAV imagery, and camera trap footage.

OpenCV covers the low-level operations: colour space conversion, morphological transforms, contour detection, optical flow. It is fast, battle-tested, and has bindings for almost every language.

Pillow and scikit-image handle higher-level image tasks: format conversion, filtering, feature detection, region properties. I use scikit-image when I need something that feels more NumPy-native.

PyTorch with TorchVision is the framework I use for training and running deep learning models. The Ultralytics package wraps YOLO models with a clean API for object detection and instance segmentation, which comes up frequently in ecological monitoring applications. For cell and nucleus segmentation in microscopy, Cellpose has become the default.

mamba install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
pip install ultralytics cellpose opencv-python scikit-image

LLM and RAG

The last two years have added a new layer to the stack.

LangChain is my go-to orchestration framework for LLM-powered applications. It handles prompt templates, chains, tool use, and memory, and it integrates cleanly with local and hosted models.

Ollama runs models locally — Llama, Mistral, Phi, Gemma — which matters when data cannot leave the machine. For hosted inference I use the OpenAI API.

ChromaDB is a lightweight vector database for retrieval-augmented generation (RAG). Embeddings are generated with Sentence Transformers from HuggingFace and stored in Chroma, making it possible to build document Q&A systems that run entirely offline.

pip install langchain langchain-community chromadb sentence-transformers openai

Web and API development

Many projects end up as services — data APIs, processing backends, interactive dashboards.

FastAPI is my Python web framework of choice. It generates OpenAPI docs automatically, validates request/response bodies via Pydantic, and is fast enough for production workloads. Paired with Uvicorn as the ASGI server, it handles async endpoints naturally.

Pydantic (v2) does data validation and settings management. I use BaseModel for API schemas and BaseSettings with python-dotenv to load configuration from .env files without touching the codebase.

Motor is the async MongoDB driver. MongoDB works well for document-heavy applications — nested ecological records, agent memory stores, API response caches — where a rigid relational schema would require too many migrations.

Streamlit is the quickest path from a Python script to a shareable interactive dashboard, useful for internal tools and exploratory data apps that do not need a full frontend.

pip install fastapi uvicorn[standard] pydantic pydantic-settings motor python-dotenv streamlit

Code quality

A consistent codebase lowers the cognitive load when context-switching between projects.

Ruff is a fast Python linter and formatter (written in Rust) that replaces Flake8, isort, and most of what Black does — in one tool. I configure it once in pyproject.toml and let the VS Code extension apply it on save.

MyPy for gradual static typing on projects that grow large enough to benefit from it. Not every script needs type annotations, but FastAPI models and anything with a public API should be typed.

Pytest with pytest-asyncio for testing. Writing tests for FastAPI endpoints and async database operations is straightforward with these two together.

pip install ruff mypy pytest pytest-asyncio

Editor: Visual Studio Code

All of the above runs inside Visual Studio Code. The Python extension (Microsoft), Pylance for type inference, the Ruff extension for linting, and the Jupyter extension for notebooks cover the workflow. The built-in terminal, Git panel, and Dev Containers extension for Docker-based environments mean I rarely need to leave the editor.

The toolkit has grown considerably since the early Python days. What has stayed constant is the philosophy: use the best tool for the problem, keep environments isolated, and document dependencies so the work is reproducible. The specific packages will keep changing; the practices around them are what actually matter.

Originally posted July 27, 2025 — updated May 2026