Python Setup

Using UV for Python management

UV is a modern, fast Python package installer and resolver written in Rust. It's designed to be a drop-in replacement for pip and pip-tools, with significantly faster dependency resolution and installation.

Installing UV

# On Linux/macOS
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Via pip (if you already have Python)
pip install uv

Basic UV Usage

# Install packages
uv pip install numpy pandas

UV vs Traditional Tools

  • Speed: UV is 10-100x faster than pip for dependency resolution
  • Lock files: Built-in support for lock files with uv.lock
  • Resolution: More reliable dependency resolution
  • Compatibility: Drop-in replacement for most pip commands

Package Management and Dependencies

Use pyproject.toml for managing project dependencies. This file allows you to specify your project's metadata and dependencies in a standardized way as defined by PEP 621.

Modern Dependency Management with UV

Instead of manually editing requirements files, use uv add to add dependencies to your pyproject.toml:

# Add core dependencies
uv add numpy pandas matplotlib scikit-learn

# Add development dependencies
uv add --dev jupyter pytest black

# Add optional dependencies for specific features
uv add --optional ir "pyterrier>=0.9.0" "python-terrier>=0.4.0"

Example pyproject.toml

Use uv init to create a pyproject.toml file with the necessary structure. It should look something like this:

[project]
name = "arc-seminar-project"
version = "0.1.0"
description = "Research project for ARC seminar"
authors = [{name = "Your Name", email = "your.email@gatech.edu"}]
dependencies = [
    "numpy>=1.24.0",        # https://numpy.org/
    "pandas>=2.0.0",        # https://pandas.pydata.org/
    "matplotlib>=3.7.0",    # https://matplotlib.org/
    "scikit-learn>=1.3.0",  # https://scikit-learn.org/
    "torch>=2.0.0",         # https://pytorch.org/
    "transformers>=4.30.0", # https://huggingface.co/transformers/
]

Installing Dependencies

# Install base dependencies
uv sync

# Install with optional IR dependencies
uv sync --extra ir

# Install with development dependencies
uv sync --extra dev

# Install everything
uv sync --all-extras

Virtual Environments

Use uv venv to create and manage virtual environments easily. This will create a .venv directory in your project folder, which isolates your Python environment.

uv venv
source .venv/bin/activate  # Linux/macOS

Essential Libraries

Core Data Science Stack

Package Description
numpy Fundamental package for numerical computations in Python.
pandas Data manipulation and analysis library, providing data structures like DataFrames.
matplotlib Plotting library for creating static, animated, and interactive visualizations in Python.
scikit-learn Machine learning library for Python, providing simple and efficient tools for data mining
scipy Library for scientific and technical computing, building on NumPy.

Machine Learning and Deep Learning

Package Description
torch PyTorch library for deep learning, providing tensor computations and neural network capabilities.
transformers Hugging Face library for working with transformer models and datasets, particularly in NLP.
datasets Hugging Face library for accessing and processing datasets.
tokenizers Fast tokenizers for NLP preprocessing.
Package Description
pyterrier Python framework for information retrieval experimentation and research.
pyserini Lucene-based toolkit for reproducible information retrieval research.
faiss-cpu Facebook AI Similarity Search library for efficient similarity search and clustering.
sentence-transformers Library for sentence, text and image embeddings using transformer models.

Workflow and Pipeline Management

Package Description
luigi Workflow management system for building complex data pipelines.
wandb Weights & Biases for experiment tracking and model management.

Development and Productivity

Package Description
jupyter Interactive computing environment for notebooks.
tqdm Progress bars for Python loops and iterables.
rich Library for rich text and beautiful formatting in the terminal.