Files
homelab-design/CODING-CONVENTIONS.md
Billy D. 100ba21eba
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m2s
updates to adrs and fixing to reflect go refactor.
2026-02-23 06:14:30 -05:00

12 KiB

📐 Coding Conventions

Patterns, practices, and folder structure conventions for DaviesTechLabs repositories

Repository Conventions

homelab-k8s2 (Infrastructure)

kubernetes/
├── apps/                    # Application deployments
│   └── {namespace}/         # One folder per namespace
│       └── {app}/           # One folder per application
│           ├── app/         # Kubernetes manifests
│           │   ├── kustomization.yaml
│           │   ├── helmrelease.yaml   # OR individual manifests
│           │   └── ...
│           └── ks.yaml      # Flux Kustomization
├── components/              # Reusable Kustomize components
└── flux/                    # Flux system configuration

Naming Conventions:

  • Namespaces: lowercase with hyphens (ai-ml, cert-manager)
  • Apps: lowercase with hyphens (chat-handler, voice-assistant)
  • Secrets: {app}-{type} (e.g., milvus-credentials)

AI/ML Repos (git.daviestechlabs.io/daviestechlabs)

handler-base/                # Shared Go module for all NATS handlers
├── clients/                 #   HTTP clients (LLM, STT, TTS, embeddings, reranker)
├── config/                  #   Env-based configuration (struct tags)
├── gen/messagespb/          #   Generated protobuf stubs
├── handler/                 #   Typed NATS message handler with OTel + health wiring
├── health/                  #   HTTP health + readiness server
├── messages/                #   Type aliases from generated protobuf stubs
├── natsutil/                #   NATS publish/request with protobuf encoding
├── proto/messages/v1/       #   .proto schema source
├── go.mod
└── buf.yaml                 #   buf protobuf toolchain config

chat-handler/                # Text chat service (Go)
voice-assistant/             # Voice pipeline service (Go)
pipeline-bridge/             # Workflow engine bridge (Go)
stt-module/                  # Speech-to-text bridge (Go)
tts-module/                  # Text-to-speech bridge (Go)
├── main.go                  # Service entry point
├── main_test.go             # Unit tests
├── e2e_test.go              # End-to-end tests
├── go.mod                   # Go module (depends on handler-base)
├── Dockerfile               # Distroless container (~20 MB)
└── renovate.json            # Dependency update config

argo/                        # Argo WorkflowTemplates
├── {workflow-name}.yaml

kubeflow/                    # Kubeflow Pipelines
├── {pipeline}_pipeline.py

kuberay-images/              # GPU worker images
├── dockerfiles/
└── ray-serve/

Python Conventions

Package Management (ADR-0012)

Use uv for local development and pip in Docker for reproducibility:

# Install uv (one-time)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Or use uv sync with lock file
uv sync

# Update lock file after changing pyproject.toml
uv lock

# Run tests
uv run pytest

Code Formatting & Linting (Ruff)

All Python code must pass ruff check and ruff format before merge. Ruff is configured in each repo's pyproject.toml:

[tool.ruff]
line-length = 100
target-version = "py311"

[tool.ruff.lint]
select = ["E", "F", "W", "I", "UP", "B", "C4", "SIM"]
ignore = ["E501"]  # Line length handled by formatter

[tool.ruff.format]
quote-style = "double"

Required dev dependency:

[project.optional-dependencies]
dev = [
    "pytest>=8.0.0",
    "pytest-asyncio>=0.23.0",
    "pytest-cov>=4.0.0",  # For coverage in handler-base
    "ruff>=0.1.0",
]

Local workflow:

# Check and auto-fix
uv run ruff check --fix .

# Format code
uv run ruff format .

# Verify before commit
uv run ruff check . && uv run ruff format --check .

CI enforcement: All repos run ruff in the lint job. Commits that fail linting will not pass CI.

Kubeflow pipeline variables: For Kubeflow DSL pipelines, terminal task assignments that appear unused should have # noqa: F841 comments, as these define the DAG structure:

# Step 6: Final step (defines DAG dependency)
tts_task = synthesize_speech(text=llm_task.output)  # noqa: F841

Project Structure

// Go handler services use handler-base shared module
import (
    "git.daviestechlabs.io/daviestechlabs/handler-base/clients"
    "git.daviestechlabs.io/daviestechlabs/handler-base/config"
    "git.daviestechlabs.io/daviestechlabs/handler-base/handler"
    "git.daviestechlabs.io/daviestechlabs/handler-base/health"
    "git.daviestechlabs.io/daviestechlabs/handler-base/messages"
    "git.daviestechlabs.io/daviestechlabs/handler-base/natsutil"
)
# Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs
# Use async/await for I/O
async def handle_message(msg: Msg) -> None:
    ...

# Use dataclasses for structured data
@dataclass
class ChatRequest:
    user_id: str
    message: str
    enable_rag: bool = True

Naming

Element Convention Example
Files snake_case chat_handler.py
Classes PascalCase ChatHandler
Functions snake_case process_message
Constants UPPER_SNAKE NATS_URL
Private Leading underscore _internal_method

Type Hints

# Always use type hints
from typing import Optional, List, Dict, Any

async def query_rag(
    query: str,
    collection: str = "knowledge_base",
    top_k: int = 5,
) -> List[Dict[str, Any]]:
    ...

Error Handling

# Use specific exceptions
class RAGQueryError(Exception):
    """Raised when RAG query fails."""
    pass

# Log errors with context
import logging
logger = logging.getLogger(__name__)

try:
    result = await milvus.search(...)
except Exception as e:
    logger.error(f"RAG query failed: {e}", extra={"query": query})
    raise RAGQueryError(f"Failed to query collection {collection}") from e

NATS Message Handling

All NATS handler services use Go with Protocol Buffers encoding (see ADR-0061):

// Go NATS handler (production pattern)
func (h *Handler) handleMessage(msg *nats.Msg) {
    var req messages.ChatRequest
    if err := proto.Unmarshal(msg.Data, &req); err != nil {
        h.logger.Error("failed to unmarshal", "error", err)
        return
    }

    // Process
    result, err := h.process(ctx, &req)
    if err != nil {
        h.logger.Error("handler error", "error", err)
        msg.Nak()
        return
    }

    // Reply if request-reply pattern
    if msg.Reply != "" {
        data, _ := proto.Marshal(result)
        msg.Respond(data)
    }
    msg.Ack()
}

Python NATS is still used in Ray Serve runtime_env and Kubeflow pipeline components where needed, but all dedicated NATS handler services are Go.


Kubernetes Manifest Conventions

Labels

metadata:
  labels:
    # Required
    app.kubernetes.io/name: chat-handler
    app.kubernetes.io/instance: chat-handler
    app.kubernetes.io/component: handler
    app.kubernetes.io/part-of: ai-platform
    
    # Optional
    app.kubernetes.io/version: "1.0.0"
    app.kubernetes.io/managed-by: flux

Annotations

metadata:
  annotations:
    # Reloader for config changes
    reloader.stakater.com/auto: "true"
    
    # Documentation
    description: "Handles chat messages via NATS"

Resource Requests

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi
    
# GPU workloads
resources:
  limits:
    amd.com/gpu: 1        # AMD
    nvidia.com/gpu: 1     # NVIDIA

Health Checks

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 30

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Flux/GitOps Conventions

Kustomization Structure

# ks.yaml - Flux Kustomization
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: &app chat-handler
  namespace: flux-system
spec:
  targetNamespace: ai-ml
  commonMetadata:
    labels:
      app.kubernetes.io/name: *app
  path: ./kubernetes/apps/ai-ml/chat-handler/app
  prune: true
  sourceRef:
    kind: GitRepository
    name: flux-system
  wait: true
  interval: 30m
  retryInterval: 1m
  timeout: 5m

HelmRelease Structure

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: milvus
spec:
  interval: 30m
  chart:
    spec:
      chart: milvus
      version: 4.x.x
      sourceRef:
        kind: HelmRepository
        name: milvus
        namespace: flux-system
  values:
    # Values here

Secret References

# Never hardcode secrets
env:
  - name: DATABASE_PASSWORD
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: password

NATS Subject Conventions

Hierarchy

ai.{domain}.{scope}.{action}

Examples:
ai.chat.user.{userId}.message      # User chat message
ai.chat.response.{requestId}       # Chat response
ai.voice.user.{userId}.request     # Voice request
ai.pipeline.trigger                # Pipeline trigger

Wildcards

ai.chat.>                   # All chat events
ai.chat.user.*.message      # All user messages
ai.*.response.{id}          # Any response type

Git Conventions

Commit Messages

type(scope): subject

body (optional)

footer (optional)

Types:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation
  • style: Formatting
  • refactor: Code restructuring
  • test: Tests
  • chore: Maintenance

Examples:

feat(chat-handler): add streaming response support
fix(voice): handle empty audio gracefully
docs(adr): add decision for MessagePack format

Branch Naming

feature/short-description
fix/issue-number-description
docs/what-changed

Configuration Conventions

Environment Variables

# Use pydantic-settings or similar
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    nats_url: str = "nats://localhost:4222"
    vllm_url: str = "http://localhost:8000"
    milvus_host: str = "localhost"
    milvus_port: int = 19530
    log_level: str = "INFO"
    
    class Config:
        env_prefix = ""  # No prefix

ConfigMaps

apiVersion: v1
kind: ConfigMap
metadata:
  name: ai-services-config
data:
  NATS_URL: "nats://nats.ai-ml.svc.cluster.local:4222"
  VLLM_URL: "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
  # ... other non-sensitive config

Documentation Conventions

ADR Format

See decisions/0000-template.md

Code Comments

# Use docstrings for public functions
async def query_rag(query: str) -> List[Dict]:
    """
    Query the RAG system for relevant documents.
    
    Args:
        query: The search query string
        
    Returns:
        List of document chunks with scores
        
    Raises:
        RAGQueryError: If the query fails
    """
    ...

README Files

Each application should have a README with:

  1. Purpose
  2. Configuration
  3. Deployment
  4. Local development
  5. API documentation (if applicable)

Anti-Patterns to Avoid

Don't Do Instead
kubectl apply directly Commit to Git, let Flux deploy
Hardcode secrets Use External Secrets Operator
Use latest image tags Pin to specific versions
Skip health checks Always define liveness/readiness
Ignore resource limits Set appropriate requests/limits
Use JSON for NATS messages Use Protocol Buffers (see ADR-0061)
Write handler services in Python Use Go with handler-base module (ADR-0061)
Synchronous I/O in handlers Use goroutines / async patterns