Good catch. This is exactly the kind of boundary that breaks shiny ideas if we donβt design it deliberately.
Short answer:
β You do not need sh1re
β You do not need nginx reverse proxy
β You run the embedding model locally inside the air-gapped environment
Now letβs ground that in your reality.
Β π sentence-transformers
pip download sentence-transformers
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"π‘οΈ Air-gapped reality: what actually changes?
The only thing embeddings need is:
- a model file
- Python runtime
- CPU (or GPU if you had one)
No API calls. No cloud. No outbound traffic.
So your system becomes:
[ isolated LAN ]
TransferDepot β log ingest β local embedding model β FAISS β detectionEverything stays inside the bunker.
π§ The real requirement (this is the key)
You must pre-stage the model.
Thatβs it.
π¦ Step 1 β Get the model (outside world)
On a connected machine:
pip download sentence-transformers
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"This will cache the model locally, typically under:
~/.cache/huggingface/You grab that entire directory.
π Step 2 β Sneakernet it in
Use your existing mechanism:
- USB
- TransferDepot itself
- whatever your sanctioned path is
Drop it into something like:
/opt/models/all-MiniLM-L6-v2/βοΈ Step 3 β Force local loading
Modify your code:
model = SentenceTransformer("/opt/models/all-MiniLM-L6-v2")Now:
π« no internet lookup
π« no model download
β fully local
π§± Where this lives in your architecture
You already have a pattern:
- TransferDepot = controlled file movement
- sh1re = proxy / bridge
- isolated LAN = execution zone
This detection system belongs inside the isolated LAN, alongside TransferDepot or adjacent to it.
π§© Deployment options (ranked for your setup)
Option A β Same host as TransferDepot (simplest)
virtca8
βββ TransferDepot (Flask/uWSGI)
βββ log-detect service (Python)- no network dependency
- fastest path
- easiest to maintain
Option B β Separate internal node
isolated LAN
βββ TransferDepot
βββ Detection node (Camelot-style box)- cleaner separation
- easier to scale
Option C β sh1re involvement (only if needed)
Youβd only involve sh1re if:
- logs originate outside the isolated LAN
- or you want cross-boundary correlation
Then:
outside β sh1re β sanitized logs β TransferDepot β detectionBut:
β οΈ Do NOT put embedding inference on sh1re unless you must
Keep compute close to data inside the air-gap
π¨ Critical constraint (donβt miss this)
Model size matters.
-
all-MiniLM-L6-v2β small (~100MB) - Larger models β slower, heavier, painful offline
Start small. You can always upgrade later.
π Security upside (this is actually nice)
Air-gapped embeddings give you:
- no data exfiltration risk
- deterministic behavior
- auditable model artifact
Itβs closer to traditional infra thinking than cloud AI.
β‘ Practical integration with TransferDepot
You already have:
-
file drop zones (
/transferdepot-001/files/...) - retention rules
- admin UI
You can bolt detection on like this:
Flow:
upload logs β /files/rs2/logs/
β
detector scans folder
β
embeds + analyzes
β
writes results β /artifacts/alerts/
No JS. No new infra. Just another βartifact generator.β
π§ Clean decision point
You do NOT need:
- reverse proxy
- internet access
- external APIs
You DO need:
- model staged locally
- a small Python service
- a place to run it
π§ͺ If you want next step
I can give you:
π a drop-in daemon script that:
- watches a directory
- processes new logs
- writes alerts
Fits exactly into your TransferDepot philosophy:
simple, local, no new dependencies, no magic
Just say the word.