docs: Update for decomposed repo structure
- AGENT-ONBOARDING: New repo map with daviestechlabs Gitea repos - TECH-STACK: Reference handler-base instead of llm-workflows - CODING-CONVENTIONS: Update project structure for new repos - ADR 0006: Update GitRepository examples for Gitea repos llm-workflows has been split into: - handler-base, chat-handler, voice-assistant - kuberay-images, argo, kubeflow, mlflow, gradio-ui
This commit is contained in:
@@ -15,9 +15,21 @@ You are working on a **homelab Kubernetes cluster** running:
|
|||||||
| Repo | What It Contains | When to Edit |
|
| Repo | What It Contains | When to Edit |
|
||||||
|------|------------------|--------------|
|
|------|------------------|--------------|
|
||||||
| `homelab-k8s2` | Kubernetes manifests, Talos config, Flux | Infrastructure changes |
|
| `homelab-k8s2` | Kubernetes manifests, Talos config, Flux | Infrastructure changes |
|
||||||
| `llm-workflows` | NATS handlers, Argo/KFP workflows | Workflow/handler changes |
|
|
||||||
| `companions-frontend` | Go server, HTMX UI, VRM avatars | Frontend changes |
|
|
||||||
| `homelab-design` (this) | Architecture docs, ADRs | Design decisions |
|
| `homelab-design` (this) | Architecture docs, ADRs | Design decisions |
|
||||||
|
| `companions-frontend` | Go server, HTMX UI, VRM avatars | Frontend changes |
|
||||||
|
|
||||||
|
### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)
|
||||||
|
|
||||||
|
| Repo | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `handler-base` | Shared Python library for NATS handlers |
|
||||||
|
| `chat-handler` | Text chat with RAG pipeline |
|
||||||
|
| `voice-assistant` | Voice pipeline (STT → RAG → LLM → TTS) |
|
||||||
|
| `kuberay-images` | GPU-specific Ray worker Docker images |
|
||||||
|
| `argo` | Argo Workflows (training, batch inference) |
|
||||||
|
| `kubeflow` | Kubeflow Pipeline definitions |
|
||||||
|
| `mlflow` | MLflow integration utilities |
|
||||||
|
| `gradio-ui` | Gradio demo apps (embeddings, STT, TTS) |
|
||||||
|
|
||||||
## 🏗️ System Architecture (30-Second Version)
|
## 🏗️ System Architecture (30-Second Version)
|
||||||
|
|
||||||
@@ -74,22 +86,39 @@ talos/
|
|||||||
│ └── nvidia/nvidia-runtime.yaml
|
│ └── nvidia/nvidia-runtime.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
### Workflows (`llm-workflows`)
|
### AI/ML Services (Gitea daviestechlabs org)
|
||||||
|
|
||||||
```
|
```
|
||||||
workflows/ # NATS handler deployments
|
handler-base/ # Shared handler library
|
||||||
├── chat-handler.yaml
|
├── handler_base/ # Core classes
|
||||||
├── voice-assistant.yaml
|
│ ├── handler.py # Base Handler class
|
||||||
└── pipeline-bridge.yaml
|
│ ├── nats_client.py # NATS wrapper
|
||||||
|
│ └── clients/ # Service clients (STT, TTS, LLM, etc.)
|
||||||
|
|
||||||
|
chat-handler/ # RAG chat service
|
||||||
|
├── chat_handler_v2.py # Handler-base version
|
||||||
|
└── Dockerfile.v2
|
||||||
|
|
||||||
|
voice-assistant/ # Voice pipeline service
|
||||||
|
├── voice_assistant_v2.py # Handler-base version
|
||||||
|
└── pipelines/voice_pipeline.py
|
||||||
|
|
||||||
argo/ # Argo WorkflowTemplates
|
argo/ # Argo WorkflowTemplates
|
||||||
├── document-ingestion.yaml
|
|
||||||
├── batch-inference.yaml
|
├── batch-inference.yaml
|
||||||
└── qlora-training.yaml
|
├── qlora-training.yaml
|
||||||
|
└── document-ingestion.yaml
|
||||||
|
|
||||||
pipelines/ # Kubeflow Pipeline Python
|
kubeflow/ # Kubeflow Pipeline definitions
|
||||||
├── voice_pipeline.py
|
├── voice_pipeline.py
|
||||||
└── document_ingestion_pipeline.py
|
├── document_ingestion_pipeline.py
|
||||||
|
└── evaluation_pipeline.py
|
||||||
|
|
||||||
|
kuberay-images/ # GPU worker images
|
||||||
|
├── dockerfiles/
|
||||||
|
│ ├── Dockerfile.ray-worker-nvidia
|
||||||
|
│ ├── Dockerfile.ray-worker-strixhalo
|
||||||
|
│ └── Dockerfile.ray-worker-rdna2
|
||||||
|
└── ray-serve/ # Serve modules
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🔌 Service Endpoints (Internal)
|
## 🔌 Service Endpoints (Internal)
|
||||||
@@ -138,14 +167,24 @@ f"ai.pipeline.status.{request_id}" # Status updates
|
|||||||
### Deploy a New AI Service
|
### Deploy a New AI Service
|
||||||
|
|
||||||
1. Create InferenceService in `homelab-k8s2/kubernetes/apps/ai-ml/kserve/`
|
1. Create InferenceService in `homelab-k8s2/kubernetes/apps/ai-ml/kserve/`
|
||||||
2. Add endpoint to `llm-workflows/config/ai-services-config.yaml`
|
2. Push to main → Flux deploys automatically
|
||||||
|
|
||||||
|
### Add a New NATS Handler
|
||||||
|
|
||||||
|
1. Create handler repo or add to existing (use `handler-base` library)
|
||||||
|
2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/`
|
||||||
3. Push to main → Flux deploys automatically
|
3. Push to main → Flux deploys automatically
|
||||||
|
|
||||||
### Add a New Workflow
|
### Add a New Argo Workflow
|
||||||
|
|
||||||
1. Create handler in `llm-workflows/chat-handler/` or `llm-workflows/voice-assistant/`
|
1. Add WorkflowTemplate to `argo/` repo
|
||||||
2. Add Kubernetes Deployment in `llm-workflows/workflows/`
|
2. Push to main → Gitea syncs to cluster
|
||||||
3. Push to main → Flux deploys automatically
|
|
||||||
|
### Add a New Kubeflow Pipeline
|
||||||
|
|
||||||
|
1. Add pipeline .py to `kubeflow/` repo
|
||||||
|
2. Compile with `python pipeline.py`
|
||||||
|
3. Upload YAML to Kubeflow UI
|
||||||
|
|
||||||
### Create Architecture Decision
|
### Create Architecture Decision
|
||||||
|
|
||||||
|
|||||||
@@ -25,24 +25,34 @@ kubernetes/
|
|||||||
- Apps: lowercase with hyphens (`chat-handler`, `voice-assistant`)
|
- Apps: lowercase with hyphens (`chat-handler`, `voice-assistant`)
|
||||||
- Secrets: `{app}-{type}` (e.g., `milvus-credentials`)
|
- Secrets: `{app}-{type}` (e.g., `milvus-credentials`)
|
||||||
|
|
||||||
### llm-workflows (Orchestration)
|
### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)
|
||||||
|
|
||||||
```
|
```
|
||||||
workflows/ # Kubernetes Deployments for NATS handlers
|
handler-base/ # Shared library for all handlers
|
||||||
├── {handler}.yaml # One file per handler
|
├── handler_base/
|
||||||
|
│ ├── handler.py # Base Handler class
|
||||||
|
│ ├── nats_client.py # NATS wrapper
|
||||||
|
│ ├── config.py # Pydantic Settings
|
||||||
|
│ ├── health.py # K8s probes
|
||||||
|
│ ├── telemetry.py # OpenTelemetry
|
||||||
|
│ └── clients/ # Service clients
|
||||||
|
└── pyproject.toml
|
||||||
|
|
||||||
|
chat-handler/ # Text chat service
|
||||||
|
voice-assistant/ # Voice pipeline service
|
||||||
|
├── {name}.py # Standalone version
|
||||||
|
├── {name}_v2.py # Handler-base version (preferred)
|
||||||
|
└── Dockerfile.v2
|
||||||
|
|
||||||
argo/ # Argo WorkflowTemplates
|
argo/ # Argo WorkflowTemplates
|
||||||
├── {workflow-name}.yaml # One file per workflow
|
├── {workflow-name}.yaml
|
||||||
|
|
||||||
pipelines/ # Kubeflow Pipeline Python files
|
kubeflow/ # Kubeflow Pipelines
|
||||||
├── {pipeline}_pipeline.py # Pipeline definition
|
├── {pipeline}_pipeline.py
|
||||||
└── kfp-sync-job.yaml # Upload job
|
|
||||||
|
|
||||||
{handler}/ # Python source code
|
kuberay-images/ # GPU worker images
|
||||||
├── __init__.py
|
├── dockerfiles/
|
||||||
├── {handler}.py # Main entry point
|
└── ray-serve/
|
||||||
├── requirements.txt
|
|
||||||
└── Dockerfile
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
@@ -237,20 +237,26 @@
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Python Dependencies (llm-workflows)
|
## Python Dependencies (handler-base)
|
||||||
|
|
||||||
|
Core library for all NATS handlers: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base)
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
# Core
|
# Core
|
||||||
nats-py>=2.7.0 # NATS client
|
nats-py>=2.7.0 # NATS client
|
||||||
msgpack>=1.0.0 # Binary serialization
|
msgpack>=1.0.0 # Binary serialization
|
||||||
aiohttp>=3.9.0 # HTTP client
|
httpx>=0.27.0 # HTTP client
|
||||||
|
|
||||||
# ML/AI
|
# ML/AI
|
||||||
pymilvus>=2.4.0 # Milvus client
|
pymilvus>=2.4.0 # Milvus client
|
||||||
sentence-transformers # Embeddings
|
|
||||||
openai>=1.0.0 # vLLM OpenAI API
|
openai>=1.0.0 # vLLM OpenAI API
|
||||||
|
|
||||||
# Kubeflow
|
# Observability
|
||||||
|
opentelemetry-api>=1.20.0
|
||||||
|
opentelemetry-sdk>=1.20.0
|
||||||
|
mlflow>=2.10.0 # Experiment tracking
|
||||||
|
|
||||||
|
# Kubeflow (kubeflow repo)
|
||||||
kfp>=2.12.1 # Pipeline SDK
|
kfp>=2.12.1 # Pipeline SDK
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -65,20 +65,23 @@ homelab-k8s2/
|
|||||||
### Multi-Repository Sync
|
### Multi-Repository Sync
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
# GitRepository for llm-workflows
|
# GitRepository for Gitea repos (daviestechlabs org)
|
||||||
|
# Examples: argo, kubeflow, chat-handler, voice-assistant
|
||||||
apiVersion: source.toolkit.fluxcd.io/v1
|
apiVersion: source.toolkit.fluxcd.io/v1
|
||||||
kind: GitRepository
|
kind: GitRepository
|
||||||
metadata:
|
metadata:
|
||||||
name: llm-workflows
|
name: argo-workflows
|
||||||
namespace: flux-system
|
namespace: flux-system
|
||||||
spec:
|
spec:
|
||||||
url: ssh://git@github.com/Billy-Davies-2/llm-workflows
|
url: https://git.daviestechlabs.io/daviestechlabs/argo.git
|
||||||
ref:
|
ref:
|
||||||
branch: main
|
branch: main
|
||||||
secretRef:
|
# Public repos don't need secretRef
|
||||||
name: github-deploy-key
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Note: The monolithic `llm-workflows` repo has been decomposed into separate repos
|
||||||
|
in the daviestechlabs Gitea organization. See AGENT-ONBOARDING.md for the full list.
|
||||||
|
|
||||||
### SOPS Integration
|
### SOPS Integration
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
|
|||||||
Reference in New Issue
Block a user