Files
homelab-design/CODING-CONVENTIONS.md
Billy D. 100ba21eba
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m2s
updates to adrs and fixing to reflect go refactor.
2026-02-23 06:14:30 -05:00

529 lines
12 KiB
Markdown

# 📐 Coding Conventions
> **Patterns, practices, and folder structure conventions for DaviesTechLabs repositories**
## Repository Conventions
### homelab-k8s2 (Infrastructure)
```
kubernetes/
├── apps/ # Application deployments
│ └── {namespace}/ # One folder per namespace
│ └── {app}/ # One folder per application
│ ├── app/ # Kubernetes manifests
│ │ ├── kustomization.yaml
│ │ ├── helmrelease.yaml # OR individual manifests
│ │ └── ...
│ └── ks.yaml # Flux Kustomization
├── components/ # Reusable Kustomize components
└── flux/ # Flux system configuration
```
**Naming Conventions:**
- Namespaces: lowercase with hyphens (`ai-ml`, `cert-manager`)
- Apps: lowercase with hyphens (`chat-handler`, `voice-assistant`)
- Secrets: `{app}-{type}` (e.g., `milvus-credentials`)
### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)
```
handler-base/ # Shared Go module for all NATS handlers
├── clients/ # HTTP clients (LLM, STT, TTS, embeddings, reranker)
├── config/ # Env-based configuration (struct tags)
├── gen/messagespb/ # Generated protobuf stubs
├── handler/ # Typed NATS message handler with OTel + health wiring
├── health/ # HTTP health + readiness server
├── messages/ # Type aliases from generated protobuf stubs
├── natsutil/ # NATS publish/request with protobuf encoding
├── proto/messages/v1/ # .proto schema source
├── go.mod
└── buf.yaml # buf protobuf toolchain config
chat-handler/ # Text chat service (Go)
voice-assistant/ # Voice pipeline service (Go)
pipeline-bridge/ # Workflow engine bridge (Go)
stt-module/ # Speech-to-text bridge (Go)
tts-module/ # Text-to-speech bridge (Go)
├── main.go # Service entry point
├── main_test.go # Unit tests
├── e2e_test.go # End-to-end tests
├── go.mod # Go module (depends on handler-base)
├── Dockerfile # Distroless container (~20 MB)
└── renovate.json # Dependency update config
argo/ # Argo WorkflowTemplates
├── {workflow-name}.yaml
kubeflow/ # Kubeflow Pipelines
├── {pipeline}_pipeline.py
kuberay-images/ # GPU worker images
├── dockerfiles/
└── ray-serve/
```
---
## Python Conventions
### Package Management (ADR-0012)
Use **uv** for local development and **pip** in Docker for reproducibility:
```bash
# Install uv (one-time)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
# Or use uv sync with lock file
uv sync
# Update lock file after changing pyproject.toml
uv lock
# Run tests
uv run pytest
```
### Code Formatting & Linting (Ruff)
All Python code must pass `ruff check` and `ruff format` before merge. Ruff is configured in each repo's `pyproject.toml`:
```toml
[tool.ruff]
line-length = 100
target-version = "py311"
[tool.ruff.lint]
select = ["E", "F", "W", "I", "UP", "B", "C4", "SIM"]
ignore = ["E501"] # Line length handled by formatter
[tool.ruff.format]
quote-style = "double"
```
**Required dev dependency:**
```toml
[project.optional-dependencies]
dev = [
"pytest>=8.0.0",
"pytest-asyncio>=0.23.0",
"pytest-cov>=4.0.0", # For coverage in handler-base
"ruff>=0.1.0",
]
```
**Local workflow:**
```bash
# Check and auto-fix
uv run ruff check --fix .
# Format code
uv run ruff format .
# Verify before commit
uv run ruff check . && uv run ruff format --check .
```
**CI enforcement:** All repos run ruff in the lint job. Commits that fail linting will not pass CI.
**Kubeflow pipeline variables:** For Kubeflow DSL pipelines, terminal task assignments that appear unused should have `# noqa: F841` comments, as these define the DAG structure:
```python
# Step 6: Final step (defines DAG dependency)
tts_task = synthesize_speech(text=llm_task.output) # noqa: F841
```
### Project Structure
```go
// Go handler services use handler-base shared module
import (
"git.daviestechlabs.io/daviestechlabs/handler-base/clients"
"git.daviestechlabs.io/daviestechlabs/handler-base/config"
"git.daviestechlabs.io/daviestechlabs/handler-base/handler"
"git.daviestechlabs.io/daviestechlabs/handler-base/health"
"git.daviestechlabs.io/daviestechlabs/handler-base/messages"
"git.daviestechlabs.io/daviestechlabs/handler-base/natsutil"
)
```
```python
# Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs
# Use async/await for I/O
async def handle_message(msg: Msg) -> None:
...
# Use dataclasses for structured data
@dataclass
class ChatRequest:
user_id: str
message: str
enable_rag: bool = True
```
### Naming
| Element | Convention | Example |
|---------|------------|---------|
| Files | snake_case | `chat_handler.py` |
| Classes | PascalCase | `ChatHandler` |
| Functions | snake_case | `process_message` |
| Constants | UPPER_SNAKE | `NATS_URL` |
| Private | Leading underscore | `_internal_method` |
### Type Hints
```python
# Always use type hints
from typing import Optional, List, Dict, Any
async def query_rag(
query: str,
collection: str = "knowledge_base",
top_k: int = 5,
) -> List[Dict[str, Any]]:
...
```
### Error Handling
```python
# Use specific exceptions
class RAGQueryError(Exception):
"""Raised when RAG query fails."""
pass
# Log errors with context
import logging
logger = logging.getLogger(__name__)
try:
result = await milvus.search(...)
except Exception as e:
logger.error(f"RAG query failed: {e}", extra={"query": query})
raise RAGQueryError(f"Failed to query collection {collection}") from e
```
### NATS Message Handling
All NATS handler services use Go with Protocol Buffers encoding (see [ADR-0061](decisions/0061-go-handler-refactor.md)):
```go
// Go NATS handler (production pattern)
func (h *Handler) handleMessage(msg *nats.Msg) {
var req messages.ChatRequest
if err := proto.Unmarshal(msg.Data, &req); err != nil {
h.logger.Error("failed to unmarshal", "error", err)
return
}
// Process
result, err := h.process(ctx, &req)
if err != nil {
h.logger.Error("handler error", "error", err)
msg.Nak()
return
}
// Reply if request-reply pattern
if msg.Reply != "" {
data, _ := proto.Marshal(result)
msg.Respond(data)
}
msg.Ack()
}
```
> **Python NATS** is still used in Ray Serve `runtime_env` and Kubeflow pipeline components where needed, but all dedicated NATS handler services are Go.
---
## Kubernetes Manifest Conventions
### Labels
```yaml
metadata:
labels:
# Required
app.kubernetes.io/name: chat-handler
app.kubernetes.io/instance: chat-handler
app.kubernetes.io/component: handler
app.kubernetes.io/part-of: ai-platform
# Optional
app.kubernetes.io/version: "1.0.0"
app.kubernetes.io/managed-by: flux
```
### Annotations
```yaml
metadata:
annotations:
# Reloader for config changes
reloader.stakater.com/auto: "true"
# Documentation
description: "Handles chat messages via NATS"
```
### Resource Requests
```yaml
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# GPU workloads
resources:
limits:
amd.com/gpu: 1 # AMD
nvidia.com/gpu: 1 # NVIDIA
```
### Health Checks
```yaml
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
```
---
## Flux/GitOps Conventions
### Kustomization Structure
```yaml
# ks.yaml - Flux Kustomization
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: &app chat-handler
namespace: flux-system
spec:
targetNamespace: ai-ml
commonMetadata:
labels:
app.kubernetes.io/name: *app
path: ./kubernetes/apps/ai-ml/chat-handler/app
prune: true
sourceRef:
kind: GitRepository
name: flux-system
wait: true
interval: 30m
retryInterval: 1m
timeout: 5m
```
### HelmRelease Structure
```yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: milvus
spec:
interval: 30m
chart:
spec:
chart: milvus
version: 4.x.x
sourceRef:
kind: HelmRepository
name: milvus
namespace: flux-system
values:
# Values here
```
### Secret References
```yaml
# Never hardcode secrets
env:
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
```
---
## NATS Subject Conventions
### Hierarchy
```
ai.{domain}.{scope}.{action}
Examples:
ai.chat.user.{userId}.message # User chat message
ai.chat.response.{requestId} # Chat response
ai.voice.user.{userId}.request # Voice request
ai.pipeline.trigger # Pipeline trigger
```
### Wildcards
```
ai.chat.> # All chat events
ai.chat.user.*.message # All user messages
ai.*.response.{id} # Any response type
```
---
## Git Conventions
### Commit Messages
```
type(scope): subject
body (optional)
footer (optional)
```
**Types:**
- `feat`: New feature
- `fix`: Bug fix
- `docs`: Documentation
- `style`: Formatting
- `refactor`: Code restructuring
- `test`: Tests
- `chore`: Maintenance
**Examples:**
```
feat(chat-handler): add streaming response support
fix(voice): handle empty audio gracefully
docs(adr): add decision for MessagePack format
```
### Branch Naming
```
feature/short-description
fix/issue-number-description
docs/what-changed
```
---
## Configuration Conventions
### Environment Variables
```python
# Use pydantic-settings or similar
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
nats_url: str = "nats://localhost:4222"
vllm_url: str = "http://localhost:8000"
milvus_host: str = "localhost"
milvus_port: int = 19530
log_level: str = "INFO"
class Config:
env_prefix = "" # No prefix
```
### ConfigMaps
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: ai-services-config
data:
NATS_URL: "nats://nats.ai-ml.svc.cluster.local:4222"
VLLM_URL: "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
# ... other non-sensitive config
```
---
## Documentation Conventions
### ADR Format
See [decisions/0000-template.md](decisions/0000-template.md)
### Code Comments
```python
# Use docstrings for public functions
async def query_rag(query: str) -> List[Dict]:
"""
Query the RAG system for relevant documents.
Args:
query: The search query string
Returns:
List of document chunks with scores
Raises:
RAGQueryError: If the query fails
"""
...
```
### README Files
Each application should have a README with:
1. Purpose
2. Configuration
3. Deployment
4. Local development
5. API documentation (if applicable)
---
## Anti-Patterns to Avoid
| Don't | Do Instead |
|-------|------------|
| `kubectl apply` directly | Commit to Git, let Flux deploy |
| Hardcode secrets | Use External Secrets Operator |
| Use `latest` image tags | Pin to specific versions |
| Skip health checks | Always define liveness/readiness |
| Ignore resource limits | Set appropriate requests/limits |
| Use JSON for NATS messages | Use Protocol Buffers (see ADR-0061) |
| Write handler services in Python | Use Go with handler-base module (ADR-0061) |
| Synchronous I/O in handlers | Use goroutines / async patterns |
---
## Related Documents
- [TECH-STACK.md](TECH-STACK.md) - Technologies used
- [ARCHITECTURE.md](ARCHITECTURE.md) - System design
- [decisions/](decisions/) - Why we made certain choices