Python Setup
Using UV for Python management
UV is a modern, fast Python package installer and resolver written in Rust. It's designed to be a drop-in replacement for pip and pip-tools, with significantly faster dependency resolution and installation.
Installing UV
# On Linux/macOS
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Via pip (if you already have Python)
pip install uv
Basic UV Usage
# Install packages
uv pip install numpy pandas
UV vs Traditional Tools
- Speed: UV is 10-100x faster than pip for dependency resolution
- Lock files: Built-in support for lock files with
uv.lock
- Resolution: More reliable dependency resolution
- Compatibility: Drop-in replacement for most pip commands
Package Management and Dependencies
Use pyproject.toml
for managing project dependencies. This file allows you to specify your project's metadata and dependencies in a standardized way as defined by PEP 621.
Modern Dependency Management with UV
Instead of manually editing requirements files, use uv add
to add dependencies to your pyproject.toml
:
# Add core dependencies
uv add numpy pandas matplotlib scikit-learn
# Add development dependencies
uv add --dev jupyter pytest black
# Add optional dependencies for specific features
uv add --optional ir "pyterrier>=0.9.0" "python-terrier>=0.4.0"
Example pyproject.toml
Use uv init
to create a pyproject.toml
file with the necessary structure.
It should look something like this:
[project]
name = "arc-seminar-project"
version = "0.1.0"
description = "Research project for ARC seminar"
authors = [{name = "Your Name", email = "your.email@gatech.edu"}]
dependencies = [
"numpy>=1.24.0", # https://numpy.org/
"pandas>=2.0.0", # https://pandas.pydata.org/
"matplotlib>=3.7.0", # https://matplotlib.org/
"scikit-learn>=1.3.0", # https://scikit-learn.org/
"torch>=2.0.0", # https://pytorch.org/
"transformers>=4.30.0", # https://huggingface.co/transformers/
]
Installing Dependencies
# Install base dependencies
uv sync
# Install with optional IR dependencies
uv sync --extra ir
# Install with development dependencies
uv sync --extra dev
# Install everything
uv sync --all-extras
Virtual Environments
Use uv venv
to create and manage virtual environments easily.
This will create a .venv
directory in your project folder, which isolates your Python environment.
uv venv
source .venv/bin/activate # Linux/macOS
Essential Libraries
Core Data Science Stack
Package | Description |
---|---|
numpy | Fundamental package for numerical computations in Python. |
pandas | Data manipulation and analysis library, providing data structures like DataFrames. |
matplotlib | Plotting library for creating static, animated, and interactive visualizations in Python. |
scikit-learn | Machine learning library for Python, providing simple and efficient tools for data mining |
scipy | Library for scientific and technical computing, building on NumPy. |
Machine Learning and Deep Learning
Package | Description |
---|---|
torch | PyTorch library for deep learning, providing tensor computations and neural network capabilities. |
transformers | Hugging Face library for working with transformer models and datasets, particularly in NLP. |
datasets | Hugging Face library for accessing and processing datasets. |
tokenizers | Fast tokenizers for NLP preprocessing. |
Information Retrieval and Search
Package | Description |
---|---|
pyterrier | Python framework for information retrieval experimentation and research. |
pyserini | Lucene-based toolkit for reproducible information retrieval research. |
faiss-cpu | Facebook AI Similarity Search library for efficient similarity search and clustering. |
sentence-transformers | Library for sentence, text and image embeddings using transformer models. |
Workflow and Pipeline Management
Package | Description |
---|---|
luigi | Workflow management system for building complex data pipelines. |
wandb | Weights & Biases for experiment tracking and model management. |
Development and Productivity
Package | Description |
---|---|
jupyter | Interactive computing environment for notebooks. |
tqdm | Progress bars for Python loops and iterables. |
rich | Library for rich text and beautiful formatting in the terminal. |