homelab-design/CODING-CONVENTIONS.md

# 📐 Coding Conventions

> **Patterns, practices, and folder structure conventions for DaviesTechLabs repositories**

## Repository Conventions

### homelab-k8s2 (Infrastructure)

```
kubernetes/
├── apps/                    # Application deployments
│   └── {namespace}/         # One folder per namespace
│       └── {app}/           # One folder per application
│           ├── app/         # Kubernetes manifests
│           │   ├── kustomization.yaml
│           │   ├── helmrelease.yaml   # OR individual manifests
│           │   └── ...
│           └── ks.yaml      # Flux Kustomization
├── components/              # Reusable Kustomize components
└── flux/                    # Flux system configuration
```

**Naming Conventions:**
- Namespaces: lowercase with hyphens (`ai-ml`, `cert-manager`)
- Apps: lowercase with hyphens (`chat-handler`, `voice-assistant`)
- Secrets: `{app}-{type}` (e.g., `milvus-credentials`)

### llm-workflows (Orchestration)

```
workflows/                   # Kubernetes Deployments for NATS handlers
├── {handler}.yaml           # One file per handler

argo/                        # Argo WorkflowTemplates
├── {workflow-name}.yaml     # One file per workflow

pipelines/                   # Kubeflow Pipeline Python files
├── {pipeline}_pipeline.py   # Pipeline definition
└── kfp-sync-job.yaml       # Upload job

{handler}/                   # Python source code
├── __init__.py
├── {handler}.py            # Main entry point
├── requirements.txt
└── Dockerfile
```

---

## Python Conventions

### Project Structure

```python
# Use async/await for I/O
async def handle_message(msg: Msg) -> None:
    ...

# Use dataclasses for structured data
@dataclass
class ChatRequest:
    user_id: str
    message: str
    enable_rag: bool = True

# Use msgpack for NATS messages
import msgpack
data = msgpack.packb({"key": "value"})
```

### Naming

| Element | Convention | Example |
|---------|------------|---------|
| Files | snake_case | `chat_handler.py` |
| Classes | PascalCase | `ChatHandler` |
| Functions | snake_case | `process_message` |
| Constants | UPPER_SNAKE | `NATS_URL` |
| Private | Leading underscore | `_internal_method` |

### Type Hints

```python
# Always use type hints
from typing import Optional, List, Dict, Any

async def query_rag(
    query: str,
    collection: str = "knowledge_base",
    top_k: int = 5,
) -> List[Dict[str, Any]]:
    ...
```

### Error Handling

```python
# Use specific exceptions
class RAGQueryError(Exception):
    """Raised when RAG query fails."""
    pass

# Log errors with context
import logging
logger = logging.getLogger(__name__)

try:
    result = await milvus.search(...)
except Exception as e:
    logger.error(f"RAG query failed: {e}", extra={"query": query})
    raise RAGQueryError(f"Failed to query collection {collection}") from e
```

### NATS Message Handling

```python
import nats
import msgpack

async def message_handler(msg: Msg) -> None:
    try:
        # Decode MessagePack
        data = msgpack.unpackb(msg.data, raw=False)

        # Process
        result = await process(data)

        # Reply if request-reply pattern
        if msg.reply:
            await msg.respond(msgpack.packb(result))

        # Acknowledge for JetStream
        await msg.ack()

    except Exception as e:
        logger.error(f"Handler error: {e}")
        # NAK for retry (JetStream)
        await msg.nak()
```

---

## Kubernetes Manifest Conventions

### Labels

```yaml
metadata:
  labels:
    # Required
    app.kubernetes.io/name: chat-handler
    app.kubernetes.io/instance: chat-handler
    app.kubernetes.io/component: handler
    app.kubernetes.io/part-of: ai-platform

    # Optional
    app.kubernetes.io/version: "1.0.0"
    app.kubernetes.io/managed-by: flux
```

### Annotations

```yaml
metadata:
  annotations:
    # Reloader for config changes
    reloader.stakater.com/auto: "true"

    # Documentation
    description: "Handles chat messages via NATS"
```

### Resource Requests

```yaml
resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

# GPU workloads
resources:
  limits:
    amd.com/gpu: 1        # AMD
    nvidia.com/gpu: 1     # NVIDIA
```

### Health Checks

```yaml
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 30

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
```

---

## Flux/GitOps Conventions

### Kustomization Structure

```yaml
# ks.yaml - Flux Kustomization
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: &app chat-handler
  namespace: flux-system
spec:
  targetNamespace: ai-ml
  commonMetadata:
    labels:
      app.kubernetes.io/name: *app
  path: ./kubernetes/apps/ai-ml/chat-handler/app
  prune: true
  sourceRef:
    kind: GitRepository
    name: flux-system
  wait: true
  interval: 30m
  retryInterval: 1m
  timeout: 5m
```

### HelmRelease Structure

```yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: milvus
spec:
  interval: 30m
  chart:
    spec:
      chart: milvus
      version: 4.x.x
      sourceRef:
        kind: HelmRepository
        name: milvus
        namespace: flux-system
  values:
    # Values here
```

### Secret References

```yaml
# Never hardcode secrets
env:
  - name: DATABASE_PASSWORD
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: password
```

---

## NATS Subject Conventions

### Hierarchy

```
ai.{domain}.{scope}.{action}

Examples:
ai.chat.user.{userId}.message      # User chat message
ai.chat.response.{requestId}       # Chat response
ai.voice.user.{userId}.request     # Voice request
ai.pipeline.trigger                # Pipeline trigger
```

### Wildcards

```
ai.chat.>                   # All chat events
ai.chat.user.*.message      # All user messages
ai.*.response.{id}          # Any response type
```

---

## Git Conventions

### Commit Messages

```
type(scope): subject

body (optional)

footer (optional)
```

**Types:**
- `feat`: New feature
- `fix`: Bug fix
- `docs`: Documentation
- `style`: Formatting
- `refactor`: Code restructuring
- `test`: Tests
- `chore`: Maintenance

**Examples:**
```
feat(chat-handler): add streaming response support
fix(voice): handle empty audio gracefully
docs(adr): add decision for MessagePack format
```

### Branch Naming

```
feature/short-description
fix/issue-number-description
docs/what-changed
```

---

## Configuration Conventions

### Environment Variables

```python
# Use pydantic-settings or similar
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    nats_url: str = "nats://localhost:4222"
    vllm_url: str = "http://localhost:8000"
    milvus_host: str = "localhost"
    milvus_port: int = 19530
    log_level: str = "INFO"

    class Config:
        env_prefix = ""  # No prefix
```

### ConfigMaps

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ai-services-config
data:
  NATS_URL: "nats://nats.ai-ml.svc.cluster.local:4222"
  VLLM_URL: "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
  # ... other non-sensitive config
```

---

## Documentation Conventions

### ADR Format

See [decisions/0000-template.md](decisions/0000-template.md)

### Code Comments

```python
# Use docstrings for public functions
async def query_rag(query: str) -> List[Dict]:
    """
    Query the RAG system for relevant documents.

    Args:
        query: The search query string

    Returns:
        List of document chunks with scores

    Raises:
        RAGQueryError: If the query fails
    """
    ...
```

### README Files

Each application should have a README with:
1. Purpose
2. Configuration
3. Deployment
4. Local development
5. API documentation (if applicable)

---

## Anti-Patterns to Avoid

| Don't | Do Instead |
|-------|------------|
| `kubectl apply` directly | Commit to Git, let Flux deploy |
| Hardcode secrets | Use External Secrets Operator |
| Use `latest` image tags | Pin to specific versions |
| Skip health checks | Always define liveness/readiness |
| Ignore resource limits | Set appropriate requests/limits |
| Use JSON for NATS messages | Use MessagePack (binary) |
| Synchronous I/O in handlers | Use async/await |

---

## Related Documents

- [TECH-STACK.md](TECH-STACK.md) - Technologies used
- [ARCHITECTURE.md](ARCHITECTURE.md) - System design
- [decisions/](decisions/) - Why we made certain choices