- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
425 lines
8.5 KiB
Markdown
425 lines
8.5 KiB
Markdown
# 📐 Coding Conventions
|
|
|
|
> **Patterns, practices, and folder structure conventions for DaviesTechLabs repositories**
|
|
|
|
## Repository Conventions
|
|
|
|
### homelab-k8s2 (Infrastructure)
|
|
|
|
```
|
|
kubernetes/
|
|
├── apps/ # Application deployments
|
|
│ └── {namespace}/ # One folder per namespace
|
|
│ └── {app}/ # One folder per application
|
|
│ ├── app/ # Kubernetes manifests
|
|
│ │ ├── kustomization.yaml
|
|
│ │ ├── helmrelease.yaml # OR individual manifests
|
|
│ │ └── ...
|
|
│ └── ks.yaml # Flux Kustomization
|
|
├── components/ # Reusable Kustomize components
|
|
└── flux/ # Flux system configuration
|
|
```
|
|
|
|
**Naming Conventions:**
|
|
- Namespaces: lowercase with hyphens (`ai-ml`, `cert-manager`)
|
|
- Apps: lowercase with hyphens (`chat-handler`, `voice-assistant`)
|
|
- Secrets: `{app}-{type}` (e.g., `milvus-credentials`)
|
|
|
|
### llm-workflows (Orchestration)
|
|
|
|
```
|
|
workflows/ # Kubernetes Deployments for NATS handlers
|
|
├── {handler}.yaml # One file per handler
|
|
|
|
argo/ # Argo WorkflowTemplates
|
|
├── {workflow-name}.yaml # One file per workflow
|
|
|
|
pipelines/ # Kubeflow Pipeline Python files
|
|
├── {pipeline}_pipeline.py # Pipeline definition
|
|
└── kfp-sync-job.yaml # Upload job
|
|
|
|
{handler}/ # Python source code
|
|
├── __init__.py
|
|
├── {handler}.py # Main entry point
|
|
├── requirements.txt
|
|
└── Dockerfile
|
|
```
|
|
|
|
---
|
|
|
|
## Python Conventions
|
|
|
|
### Project Structure
|
|
|
|
```python
|
|
# Use async/await for I/O
|
|
async def handle_message(msg: Msg) -> None:
|
|
...
|
|
|
|
# Use dataclasses for structured data
|
|
@dataclass
|
|
class ChatRequest:
|
|
user_id: str
|
|
message: str
|
|
enable_rag: bool = True
|
|
|
|
# Use msgpack for NATS messages
|
|
import msgpack
|
|
data = msgpack.packb({"key": "value"})
|
|
```
|
|
|
|
### Naming
|
|
|
|
| Element | Convention | Example |
|
|
|---------|------------|---------|
|
|
| Files | snake_case | `chat_handler.py` |
|
|
| Classes | PascalCase | `ChatHandler` |
|
|
| Functions | snake_case | `process_message` |
|
|
| Constants | UPPER_SNAKE | `NATS_URL` |
|
|
| Private | Leading underscore | `_internal_method` |
|
|
|
|
### Type Hints
|
|
|
|
```python
|
|
# Always use type hints
|
|
from typing import Optional, List, Dict, Any
|
|
|
|
async def query_rag(
|
|
query: str,
|
|
collection: str = "knowledge_base",
|
|
top_k: int = 5,
|
|
) -> List[Dict[str, Any]]:
|
|
...
|
|
```
|
|
|
|
### Error Handling
|
|
|
|
```python
|
|
# Use specific exceptions
|
|
class RAGQueryError(Exception):
|
|
"""Raised when RAG query fails."""
|
|
pass
|
|
|
|
# Log errors with context
|
|
import logging
|
|
logger = logging.getLogger(__name__)
|
|
|
|
try:
|
|
result = await milvus.search(...)
|
|
except Exception as e:
|
|
logger.error(f"RAG query failed: {e}", extra={"query": query})
|
|
raise RAGQueryError(f"Failed to query collection {collection}") from e
|
|
```
|
|
|
|
### NATS Message Handling
|
|
|
|
```python
|
|
import nats
|
|
import msgpack
|
|
|
|
async def message_handler(msg: Msg) -> None:
|
|
try:
|
|
# Decode MessagePack
|
|
data = msgpack.unpackb(msg.data, raw=False)
|
|
|
|
# Process
|
|
result = await process(data)
|
|
|
|
# Reply if request-reply pattern
|
|
if msg.reply:
|
|
await msg.respond(msgpack.packb(result))
|
|
|
|
# Acknowledge for JetStream
|
|
await msg.ack()
|
|
|
|
except Exception as e:
|
|
logger.error(f"Handler error: {e}")
|
|
# NAK for retry (JetStream)
|
|
await msg.nak()
|
|
```
|
|
|
|
---
|
|
|
|
## Kubernetes Manifest Conventions
|
|
|
|
### Labels
|
|
|
|
```yaml
|
|
metadata:
|
|
labels:
|
|
# Required
|
|
app.kubernetes.io/name: chat-handler
|
|
app.kubernetes.io/instance: chat-handler
|
|
app.kubernetes.io/component: handler
|
|
app.kubernetes.io/part-of: ai-platform
|
|
|
|
# Optional
|
|
app.kubernetes.io/version: "1.0.0"
|
|
app.kubernetes.io/managed-by: flux
|
|
```
|
|
|
|
### Annotations
|
|
|
|
```yaml
|
|
metadata:
|
|
annotations:
|
|
# Reloader for config changes
|
|
reloader.stakater.com/auto: "true"
|
|
|
|
# Documentation
|
|
description: "Handles chat messages via NATS"
|
|
```
|
|
|
|
### Resource Requests
|
|
|
|
```yaml
|
|
resources:
|
|
requests:
|
|
cpu: 100m
|
|
memory: 256Mi
|
|
limits:
|
|
cpu: 500m
|
|
memory: 512Mi
|
|
|
|
# GPU workloads
|
|
resources:
|
|
limits:
|
|
amd.com/gpu: 1 # AMD
|
|
nvidia.com/gpu: 1 # NVIDIA
|
|
```
|
|
|
|
### Health Checks
|
|
|
|
```yaml
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health
|
|
port: 8080
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 30
|
|
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /ready
|
|
port: 8080
|
|
initialDelaySeconds: 5
|
|
periodSeconds: 10
|
|
```
|
|
|
|
---
|
|
|
|
## Flux/GitOps Conventions
|
|
|
|
### Kustomization Structure
|
|
|
|
```yaml
|
|
# ks.yaml - Flux Kustomization
|
|
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
|
kind: Kustomization
|
|
metadata:
|
|
name: &app chat-handler
|
|
namespace: flux-system
|
|
spec:
|
|
targetNamespace: ai-ml
|
|
commonMetadata:
|
|
labels:
|
|
app.kubernetes.io/name: *app
|
|
path: ./kubernetes/apps/ai-ml/chat-handler/app
|
|
prune: true
|
|
sourceRef:
|
|
kind: GitRepository
|
|
name: flux-system
|
|
wait: true
|
|
interval: 30m
|
|
retryInterval: 1m
|
|
timeout: 5m
|
|
```
|
|
|
|
### HelmRelease Structure
|
|
|
|
```yaml
|
|
apiVersion: helm.toolkit.fluxcd.io/v2
|
|
kind: HelmRelease
|
|
metadata:
|
|
name: milvus
|
|
spec:
|
|
interval: 30m
|
|
chart:
|
|
spec:
|
|
chart: milvus
|
|
version: 4.x.x
|
|
sourceRef:
|
|
kind: HelmRepository
|
|
name: milvus
|
|
namespace: flux-system
|
|
values:
|
|
# Values here
|
|
```
|
|
|
|
### Secret References
|
|
|
|
```yaml
|
|
# Never hardcode secrets
|
|
env:
|
|
- name: DATABASE_PASSWORD
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: postgres-credentials
|
|
key: password
|
|
```
|
|
|
|
---
|
|
|
|
## NATS Subject Conventions
|
|
|
|
### Hierarchy
|
|
|
|
```
|
|
ai.{domain}.{scope}.{action}
|
|
|
|
Examples:
|
|
ai.chat.user.{userId}.message # User chat message
|
|
ai.chat.response.{requestId} # Chat response
|
|
ai.voice.user.{userId}.request # Voice request
|
|
ai.pipeline.trigger # Pipeline trigger
|
|
```
|
|
|
|
### Wildcards
|
|
|
|
```
|
|
ai.chat.> # All chat events
|
|
ai.chat.user.*.message # All user messages
|
|
ai.*.response.{id} # Any response type
|
|
```
|
|
|
|
---
|
|
|
|
## Git Conventions
|
|
|
|
### Commit Messages
|
|
|
|
```
|
|
type(scope): subject
|
|
|
|
body (optional)
|
|
|
|
footer (optional)
|
|
```
|
|
|
|
**Types:**
|
|
- `feat`: New feature
|
|
- `fix`: Bug fix
|
|
- `docs`: Documentation
|
|
- `style`: Formatting
|
|
- `refactor`: Code restructuring
|
|
- `test`: Tests
|
|
- `chore`: Maintenance
|
|
|
|
**Examples:**
|
|
```
|
|
feat(chat-handler): add streaming response support
|
|
fix(voice): handle empty audio gracefully
|
|
docs(adr): add decision for MessagePack format
|
|
```
|
|
|
|
### Branch Naming
|
|
|
|
```
|
|
feature/short-description
|
|
fix/issue-number-description
|
|
docs/what-changed
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration Conventions
|
|
|
|
### Environment Variables
|
|
|
|
```python
|
|
# Use pydantic-settings or similar
|
|
from pydantic_settings import BaseSettings
|
|
|
|
class Settings(BaseSettings):
|
|
nats_url: str = "nats://localhost:4222"
|
|
vllm_url: str = "http://localhost:8000"
|
|
milvus_host: str = "localhost"
|
|
milvus_port: int = 19530
|
|
log_level: str = "INFO"
|
|
|
|
class Config:
|
|
env_prefix = "" # No prefix
|
|
```
|
|
|
|
### ConfigMaps
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: ai-services-config
|
|
data:
|
|
NATS_URL: "nats://nats.ai-ml.svc.cluster.local:4222"
|
|
VLLM_URL: "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
|
|
# ... other non-sensitive config
|
|
```
|
|
|
|
---
|
|
|
|
## Documentation Conventions
|
|
|
|
### ADR Format
|
|
|
|
See [decisions/0000-template.md](decisions/0000-template.md)
|
|
|
|
### Code Comments
|
|
|
|
```python
|
|
# Use docstrings for public functions
|
|
async def query_rag(query: str) -> List[Dict]:
|
|
"""
|
|
Query the RAG system for relevant documents.
|
|
|
|
Args:
|
|
query: The search query string
|
|
|
|
Returns:
|
|
List of document chunks with scores
|
|
|
|
Raises:
|
|
RAGQueryError: If the query fails
|
|
"""
|
|
...
|
|
```
|
|
|
|
### README Files
|
|
|
|
Each application should have a README with:
|
|
1. Purpose
|
|
2. Configuration
|
|
3. Deployment
|
|
4. Local development
|
|
5. API documentation (if applicable)
|
|
|
|
---
|
|
|
|
## Anti-Patterns to Avoid
|
|
|
|
| Don't | Do Instead |
|
|
|-------|------------|
|
|
| `kubectl apply` directly | Commit to Git, let Flux deploy |
|
|
| Hardcode secrets | Use External Secrets Operator |
|
|
| Use `latest` image tags | Pin to specific versions |
|
|
| Skip health checks | Always define liveness/readiness |
|
|
| Ignore resource limits | Set appropriate requests/limits |
|
|
| Use JSON for NATS messages | Use MessagePack (binary) |
|
|
| Synchronous I/O in handlers | Use async/await |
|
|
|
|
---
|
|
|
|
## Related Documents
|
|
|
|
- [TECH-STACK.md](TECH-STACK.md) - Technologies used
|
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - System design
|
|
- [decisions/](decisions/) - Why we made certain choices
|