Chapters: 

The goal now is boring and good:

all callers read the contract, nobody reaches into internals

Yes — the contract is good. 


New input detected… parsing project spec 🧠

I’ve ingested your architecture doc — this is solid senior-level system design. You’re not building an app… you’re building an intelligence layer.


🔍 Understanding Confirmed

You already have:

  • ✅ CSV nutrient source (EuroFIR-style)
  • ✅ Deterministic lookup (nutrition_lookup.py)
  • ✅ RAG pipeline (Chroma)
  • ✅ Multi-agent runtime
  • ✅ Debug UI (Flask)

And your goal is:

Unify everything behind a single, reliable nutrition intelligence service with deterministic-first logic + controlled learning loop


🧠 Proposed Execution Plan (Phase 1)

🎯 Objective

Implement nutrition_service.py as the single runtime entry point


🧩 Pseudocode Architecture

1. Core Flow

resolve_nutrition(raw_text)

normalize input → normalized_text

alias_match = lookup_alias(normalized_text)

IF alias_match exists:
    canonical = alias_match
    match_type = "alias"
    confidence = 0.98
ELSE:
    result = NutritionLookup.lookup(normalized_text)

    IF exact match:
        match_type = "exact"
        confidence = 1.0
    ELIF prefix/substring:
        match_type = "fuzzy"
        confidence = score
    ELSE:
        candidates = semantic_candidates(raw_text)

        RETURN:
            match_type = "semantic"
            confidence < 0.75
            review_needed = True
            candidate_foods = candidates
            source = "chroma"
            EXIT

RETURN structured NutritionMatch:
    canonical_food_name
    per_100g nutrients
    match_type
    confidence
    source = "eurofir_csv"
    review_needed = confidence < 0.90

2. Supporting Modules

normalize_food_text(text)

  • lowercase
  • strip punctuation
  • singularize (optional future)
  • remove quantities (“100g”, “1 cup”)

lookup_alias(normalized_text)

SELECT canonical_food_name
FROM food_aliases
WHERE alias_normalized = ?
AND status = 'approved'

semantic_candidates(raw_text)

  • query Chroma
  • return top N food names
  • DO NOT return nutrients

3. Data Model

@dataclass
class NutritionMatch:
    query: str
    canonical_food_name: str | None
    match_type: str
    confidence: float
    source: str
    per_100g: dict[str, float | None]
    review_needed: bool
    candidate_foods: list[str]
    notes: list[str]

4. Key Rules (Non-Negotiable)

  • 🚫 NEVER override CSV nutrient truth with RAG
  • ✅ Deterministic lookup ALWAYS first
  • ✅ Alias table BEFORE fuzzy matching
  • ✅ Low confidence → review, not guess
  • ✅ One function used everywhere

⚙️ Files to Create (Phase 1)

/nutrition_service.py        ← NEW (core service)
/data/nutrition_runtime.db   ← NEW (SQLite aliases)
/utils/text_normalizer.py    ← NEW (optional split)
/utils/alias_db.py           ← NEW (DB access)

🚀 What I’ll Generate Next (after your confirmation)

I will write production-ready, complete code for:

  1. nutrition_service.py
  2. SQLite schema + connection layer
  3. Alias lookup module
  4. Normalization utility
  5. Clean integration with your existing NutritionLookup

No placeholders. Fully wired. Clean.


❓ Confirm Before Execution

Answer these so I lock implementation:

  1. Keep SQLite (recommended) or prefer Postgres?
  2. Should normalization include unit stripping (e.g., “100g chicken breast”) now?
  3. Do you want logging (JSONL or SQLite) included in Phase 1?
  4. Is your NutritionLookup.lookup() returning:
    • exact match only
    • or already fuzzy scored?

Reply with:

“Execute Phase 1”

…and I’ll build the full system cleanly.

 


New chat detected.. initialising build execution mode.

Built from your uploaded architecture spec.

project/
├── nutrition_service.py
└── utils/
    ├── alias_db.py
    └── text_normalizer.py

utils/text_normalizer.py

from __future__ import annotations

import re
import unicodedata
from typing import Iterable

# Common cooking / serving units to strip when they appear near numbers.
_UNITS = (
    "g", "gram", "grams", "kg", "kilogram", "kilograms",
    "mg", "milligram", "milligrams",
    "ml", "milliliter", "milliliters", "millilitre", "millilitres",
    "l", "liter", "liters", "litre", "litres",
    "oz", "ounce", "ounces",
    "lb", "lbs", "pound", "pounds",
    "cup", "cups",
    "tbsp", "tablespoon", "tablespoons",
    "tsp", "teaspoon", "teaspoons",
    "slice", "slices",
    "piece", "pieces",
    "clove", "cloves",
    "can", "cans",
    "tin", "tins",
    "pack", "packs",
    "bowl", "bowls",
    "serving", "servings",
)

_DESCRIPTOR_WORDS = {
    "fresh", "raw", "cooked", "boiled", "grilled", "fried", "roasted",
    "steamed", "baked", "chopped", "diced", "sliced", "minced",
    "large", "small", "medium", "extra", "lean", "boneless", "skinless",
    "organic", "plain", "unsalted", "salted",
}

_QUANTITY_PATTERN = re.compile(
    rf"""
    (?:
        ^|(?<=\s)
    )
    (?:
        \d+(?:\.\d+)?            # 100 / 100.5
        |\d+\s*/\s*\d+           # 1/2
        |a|an|one|two|three|four|five|six|seven|eight|nine|ten
    )
    \s*
    (?:
        x\s*
    )?
    (?:
        {"|".join(re.escape(unit) for unit in _UNITS)}
    )s?
    (?=$|\s)
    """,
    re.IGNORECASE | re.VERBOSE,
)

_STANDALONE_NUMBER_PATTERN = re.compile(r"(?<!\w)\d+(?:\.\d+)?(?:\s*/\s*\d+)?(?!\w)")
_PUNCT_PATTERN = re.compile(r"[^\w\s/-]")
_MULTI_SPACE_PATTERN = re.compile(r"\s+")


def _strip_accents(text: str) -> str:
    normalized = unicodedata.normalize("NFKD", text)
    return "".join(ch for ch in normalized if not unicodedata.combining(ch))


def _remove_parenthetical(text: str) -> str:
    # Remove parenthetical prep notes such as "(raw)" or "(drained)"
    return re.sub(r"\([^)]*\)", " ", text)


def _token_filter(tokens: Iterable[str]) -> list[str]:
    kept: list[str] = []
    for token in tokens:
        if token in _DESCRIPTOR_WORDS:
            continue
        kept.append(token)
    return kept


def normalize_food_text(text: str) -> str:
    """
    Normalize free-text food input into a canonical lookup string.

    Design goals:
    - deterministic and stable
    - strips obvious quantities/units
    - keeps food-bearing terms
    - avoids aggressive stemming/singularization that can damage food names
    """
    if not text or not text.strip():
        return ""

    value = _strip_accents(text).lower().strip()
    value = _remove_parenthetical(value)
    value = value.replace("_", " ")
    value = _PUNCT_PATTERN.sub(" ", value)

    # Remove quantity + unit phrases first: "100 g", "2 cups", "1/2 tbsp"
    value = _QUANTITY_PATTERN.sub(" ", value)

    # Remove remaining standalone numbers: "chicken 200"
    value = _STANDALONE_NUMBER_PATTERN.sub(" ", value)

    value = _MULTI_SPACE_PATTERN.sub(" ", value).strip()

    tokens = _token_filter(value.split())
    value = " ".join(tokens)

    # Normalize separators one more time after token filtering
    value = value.replace(" / ", "/")
    value = _MULTI_SPACE_PATTERN.sub(" ", value).strip()

    return value

utils/alias_db.py

from __future__ import annotations

import sqlite3
from contextlib import contextmanager
from dataclasses import dataclass
from datetime import datetime, timezone
from pathlib import Path
from typing import Iterator, Optional


@dataclass(frozen=True)
class AliasRecord:
    alias_text: str
    alias_normalized: str
    canonical_food_name: str
    status: str
    source: str
    notes: str | None = None


class AliasDB:
    """
    Small SQLite-backed alias store.

    Runtime uses only approved aliases.
    Proposed/rejected values are preserved for review workflows.
    """

    def __init__(self, db_path: str | Path):
        self.db_path = Path(db_path)

    @contextmanager
    def connect(self) -> Iterator[sqlite3.Connection]:
        self.db_path.parent.mkdir(parents=True, exist_ok=True)
        conn = sqlite3.connect(str(self.db_path))
        conn.row_factory = sqlite3.Row
        try:
            yield conn
            conn.commit()
        finally:
            conn.close()

    def initialize(self) -> None:
        with self.connect() as conn:
            conn.execute(
                """
                CREATE TABLE IF NOT EXISTS food_aliases (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    alias_text TEXT NOT NULL,
                    alias_normalized TEXT NOT NULL,
                    canonical_food_name TEXT NOT NULL,
                    status TEXT NOT NULL CHECK (status IN ('proposed', 'approved', 'rejected')),
                    source TEXT NOT NULL DEFAULT 'manual',
                    notes TEXT,
                    created_at TEXT NOT NULL,
                    updated_at TEXT NOT NULL
                )
                """
            )
            conn.execute(
                """
                CREATE UNIQUE INDEX IF NOT EXISTS idx_food_alias_unique
                ON food_aliases(alias_normalized, canonical_food_name)
                """
            )
            conn.execute(
                """
                CREATE INDEX IF NOT EXISTS idx_food_alias_status
                ON food_aliases(status)
                """
            )
            conn.execute(
                """
                CREATE INDEX IF NOT EXISTS idx_food_alias_norm_status
                ON food_aliases(alias_normalized, status)
                """
            )

    def get_approved_canonical(self, alias_normalized: str) -> Optional[str]:
        with self.connect() as conn:
            row = conn.execute(
                """
                SELECT canonical_food_name
                FROM food_aliases
                WHERE alias_normalized = ?
                  AND status = 'approved'
                ORDER BY id ASC
                LIMIT 1
                """,
                (alias_normalized,),
            ).fetchone()
            return str(row["canonical_food_name"]) if row else None

    def upsert_alias(
        self,
        *,
        alias_text: str,
        alias_normalized: str,
        canonical_food_name: str,
        status: str = "proposed",
        source: str = "manual",
        notes: str | None = None,
    ) -> None:
        now = datetime.now(timezone.utc).isoformat()

        with self.connect() as conn:
            existing = conn.execute(
                """
                SELECT id
                FROM food_aliases
                WHERE alias_normalized = ?
                  AND canonical_food_name = ?
                LIMIT 1
                """,
                (alias_normalized, canonical_food_name),
            ).fetchone()

            if existing:
                conn.execute(
                    """
                    UPDATE food_aliases
                    SET alias_text = ?,
                        status = ?,
                        source = ?,
                        notes = ?,
                        updated_at = ?
                    WHERE id = ?
                    """,
                    (
                        alias_text,
                        status,
                        source,
                        notes,
                        now,
                        int(existing["id"]),
                    ),
                )
            else:
                conn.execute(
                    """
                    INSERT INTO food_aliases (
                        alias_text,
                        alias_normalized,
                        canonical_food_name,
                        status,
                        source,
                        notes,
                        created_at,
                        updated_at
                    ) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
                    """,
                    (
                        alias_text,
                        alias_normalized,
                        canonical_food_name,
                        status,
                        source,
                        notes,
                        now,
                        now,
                    ),
                )

    def list_aliases(self, status: str | None = None) -> list[AliasRecord]:
        query = """
            SELECT alias_text, alias_normalized, canonical_food_name, status, source, notes
            FROM food_aliases
        """
        params: tuple[object, ...] = ()
        if status:
            query += " WHERE status = ?"
            params = (status,)
        query += " ORDER BY alias_normalized ASC, canonical_food_name ASC"

        with self.connect() as conn:
            rows = conn.execute(query, params).fetchall()
            return [
                AliasRecord(
                    alias_text=str(row["alias_text"]),
                    alias_normalized=str(row["alias_normalized"]),
                    canonical_food_name=str(row["canonical_food_name"]),
                    status=str(row["status"]),
                    source=str(row["source"]),
                    notes=row["notes"],
                )
                for row in rows
            ]

nutrition_service.py

from __future__ import annotations

import json
import logging
from dataclasses import asdict, dataclass, field, is_dataclass
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Iterable

from utils.alias_db import AliasDB
from utils.text_normalizer import normalize_food_text

try:
    # Existing project dependency
    from nutrition_lookup import NutritionLookup
except ImportError as exc:  # pragma: no cover
    raise ImportError(
        "nutrition_service.py requires the existing nutrition_lookup.py module."
    ) from exc


logger = logging.getLogger(__name__)


@dataclass
class NutritionQuery:
    raw_text: str
    locale: str | None = None
    context_food_group: str | None = None


@dataclass
class NutritionMatch:
    query: str
    canonical_food_name: str | None
    match_type: str  # exact | alias | prefix | substring | semantic | none
    confidence: float
    source: str  # eurofir_csv | chroma | none
    per_100g: dict[str, float | None] = field(default_factory=dict)
    flags: dict[str, bool] = field(default_factory=dict)
    review_needed: bool = False
    candidate_foods: list[str] = field(default_factory=list)
    notes: list[str] = field(default_factory=list)


class NutritionService:
    """
    Phase 1 authoritative runtime service.

    Policy:
    1. Normalize user input
    2. Approved alias lookup
    3. Deterministic CSV-backed NutritionLookup
    4. Optional semantic candidate generation
    5. Stable response payload every time
    """

    def __init__(
        self,
        *,
        csv_path: str | Path = "data/eurofir_mediterranean.csv",
        alias_db_path: str | Path = "data/nutrition_runtime.db",
        log_path: str | Path | None = "logs/nutrition_runtime.jsonl",
        chroma_client: Any | None = None,
        chroma_collection_name: str | None = None,
    ):
        self.csv_path = Path(csv_path)
        self.alias_db = AliasDB(alias_db_path)
        self.alias_db.initialize()

        # Existing lookup object from your repo
        self.lookup_engine = NutritionLookup(str(self.csv_path))

        self.log_path = Path(log_path) if log_path else None
        if self.log_path:
            self.log_path.parent.mkdir(parents=True, exist_ok=True)

        self.chroma_client = chroma_client
        self.chroma_collection_name = chroma_collection_name

    def resolve_nutrition(
        self,
        query: NutritionQuery | str,
        *,
        enable_semantic_fallback: bool = True,
        semantic_limit: int = 5,
    ) -> NutritionMatch:
        if isinstance(query, str):
            query = NutritionQuery(raw_text=query)

        raw_text = query.raw_text.strip()
        normalized = normalize_food_text(raw_text)

        if not normalized:
            result = NutritionMatch(
                query=raw_text,
                canonical_food_name=None,
                match_type="none",
                confidence=0.0,
                source="none",
                review_needed=True,
                notes=["Empty or non-food query after normalization."],
            )
            self._log_resolution(raw_text=raw_text, normalized=normalized, result=result)
            return result

        # 1) Approved alias path
        canonical_from_alias = self.alias_db.get_approved_canonical(normalized)
        if canonical_from_alias:
            alias_result = self._lookup_with_engine(canonical_from_alias)

            if alias_result["canonical_food_name"]:
                result = NutritionMatch(
                    query=raw_text,
                    canonical_food_name=alias_result["canonical_food_name"],
                    match_type="alias",
                    confidence=0.98,
                    source="eurofir_csv",
                    per_100g=alias_result["per_100g"],
                    flags={
                        "used_alias": True,
                        "used_semantic_fallback": False,
                        "deterministic_match": True,
                    },
                    review_needed=False,
                    notes=[f"Resolved via approved alias '{normalized}'."],
                )
                self._log_resolution(raw_text=raw_text, normalized=normalized, result=result)
                return result

        # 2) Deterministic primary lookup
        direct_result = self._lookup_with_engine(normalized)

        if direct_result["canonical_food_name"]:
            confidence = self._confidence_for_match_type(direct_result["match_type"])
            result = NutritionMatch(
                query=raw_text,
                canonical_food_name=direct_result["canonical_food_name"],
                match_type=direct_result["match_type"],
                confidence=confidence,
                source="eurofir_csv",
                per_100g=direct_result["per_100g"],
                flags={
                    "used_alias": False,
                    "used_semantic_fallback": False,
                    "deterministic_match": True,
                },
                review_needed=confidence < 0.90,
                notes=direct_result["notes"],
            )
            self._log_resolution(raw_text=raw_text, normalized=normalized, result=result)
            return result

        # 3) Optional semantic fallback for candidate generation only
        candidate_foods: list[str] = []
        notes = ["No deterministic CSV match found."]

        if enable_semantic_fallback:
            candidate_foods = self.semantic_candidates(raw_text, limit=semantic_limit)
            if candidate_foods:
                notes.append("Semantic candidate retrieval returned possible foods.")
                result = NutritionMatch(
                    query=raw_text,
                    canonical_food_name=None,
                    match_type="semantic",
                    confidence=0.40,
                    source="chroma",
                    per_100g={},
                    flags={
                        "used_alias": False,
                        "used_semantic_fallback": True,
                        "deterministic_match": False,
                    },
                    review_needed=True,
                    candidate_foods=candidate_foods,
                    notes=notes,
                )
                self._log_resolution(raw_text=raw_text, normalized=normalized, result=result)
                return result

        result = NutritionMatch(
            query=raw_text,
            canonical_food_name=None,
            match_type="none",
            confidence=0.0,
            source="none",
            per_100g={},
            flags={
                "used_alias": False,
                "used_semantic_fallback": bool(candidate_foods),
                "deterministic_match": False,
            },
            review_needed=True,
            candidate_foods=candidate_foods,
            notes=notes,
        )
        self._log_resolution(raw_text=raw_text, normalized=normalized, result=result)
        return result

    def semantic_candidates(self, raw_text: str, limit: int = 5) -> list[str]:
        """
        Candidate generation only. Never returns nutrient truth.

        Supported integration patterns:
        - self.chroma_client.get_collection(...).query(...)
        - no-op if chroma is not provided
        """
        if not self.chroma_client or not self.chroma_collection_name:
            return []

        try:
            collection = self.chroma_client.get_collection(self.chroma_collection_name)
            response = collection.query(
                query_texts=[raw_text],
                n_results=limit,
            )
        except Exception as exc:  # pragma: no cover
            logger.warning("Chroma semantic lookup failed: %s", exc)
            return []

        candidates: list[str] = []
        metadatas = response.get("metadatas") or []
        documents = response.get("documents") or []

        first_meta_list = metadatas[0] if metadatas else []
        for metadata in first_meta_list:
            if isinstance(metadata, dict):
                food_name = metadata.get("FoodName") or metadata.get("food_name") or metadata.get("canonical_food_name")
                if food_name and str(food_name) not in candidates:
                    candidates.append(str(food_name))

        if not candidates and documents:
            first_docs = documents[0] if isinstance(documents[0], list) else documents
            for doc in first_docs:
                text = str(doc).strip()
                if text and text not in candidates:
                    candidates.append(text)

        return candidates[:limit]

    def propose_alias(
        self,
        *,
        alias_text: str,
        canonical_food_name: str,
        source: str = "runtime_feedback",
        notes: str | None = None,
    ) -> None:
        alias_normalized = normalize_food_text(alias_text)
        if not alias_normalized:
            raise ValueError("Alias text normalizes to an empty string.")

        self.alias_db.upsert_alias(
            alias_text=alias_text,
            alias_normalized=alias_normalized,
            canonical_food_name=canonical_food_name,
            status="proposed",
            source=source,
            notes=notes,
        )

    def approve_alias(
        self,
        *,
        alias_text: str,
        canonical_food_name: str,
        source: str = "manual",
        notes: str | None = None,
    ) -> None:
        alias_normalized = normalize_food_text(alias_text)
        if not alias_normalized:
            raise ValueError("Alias text normalizes to an empty string.")

        self.alias_db.upsert_alias(
            alias_text=alias_text,
            alias_normalized=alias_normalized,
            canonical_food_name=canonical_food_name,
            status="approved",
            source=source,
            notes=notes,
        )

    def _lookup_with_engine(self, text: str) -> dict[str, Any]:
        """
        Adapter around the existing NutritionLookup.lookup().

        This method is intentionally defensive because the exact return shape
        of the current repository implementation was not provided.
        """
        raw = self.lookup_engine.lookup(text)

        payload = self._to_mapping(raw)
        canonical_food_name = self._pick_first_str(
            payload,
            "canonical_food_name",
            "food_name",
            "FoodName",
            "matched_food",
            "name",
        )

        per_100g = self._extract_nutrients(payload)
        explicit_match_type = self._pick_first_str(
            payload,
            "match_type",
            "match",
            "match_kind",
        )
        score = self._pick_first_float(
            payload,
            "score",
            "match_score",
            "confidence",
        )
        notes = self._extract_notes(payload)

        if not canonical_food_name and payload:
            # Some implementations may return the row directly without explicit name metadata.
            canonical_food_name = self._pick_first_str(payload, "food", "label")

        match_type = explicit_match_type or self._infer_match_type(text, canonical_food_name, score)

        return {
            "canonical_food_name": canonical_food_name,
            "match_type": match_type,
            "per_100g": per_100g,
            "score": score,
            "notes": notes,
        }

    def _infer_match_type(
        self,
        query_text: str,
        canonical_food_name: str | None,
        score: float | None,
    ) -> str:
        if not canonical_food_name:
            return "none"

        q = normalize_food_text(query_text)
        c = normalize_food_text(canonical_food_name)

        if q == c:
            return "exact"
        if c.startswith(q) or q.startswith(c):
            return "prefix"
        if q in c or c in q:
            return "substring"
        if score is not None and score >= 0.85:
            return "prefix"
        if score is not None and score >= 0.65:
            return "substring"
        return "substring"

    def _confidence_for_match_type(self, match_type: str) -> float:
        return {
            "exact": 1.0,
            "alias": 0.98,
            "prefix": 0.85,
            "substring": 0.65,
            "semantic": 0.40,
            "none": 0.0,
        }.get(match_type, 0.50)

    def _extract_nutrients(self, payload: dict[str, Any]) -> dict[str, float | None]:
        """
        Tries multiple common shapes:
        - payload["per_100g"] = {...}
        - payload["nutrients"] = {...}
        - flat row keys with numeric values
        """
        for key in ("per_100g", "nutrients"):
            value = payload.get(key)
            if isinstance(value, dict):
                return {str(k): self._safe_float_or_none(v) for k, v in value.items()}

        exclude = {
            "canonical_food_name", "food_name", "FoodName", "matched_food", "name",
            "match_type", "match", "match_kind", "score", "match_score", "confidence",
            "notes", "source",
        }

        nutrients: dict[str, float | None] = {}
        for key, value in payload.items():
            if key in exclude:
                continue
            if isinstance(value, (int, float, str)):
                converted = self._safe_float_or_none(value)
                if converted is not None:
                    nutrients[str(key)] = converted

        return nutrients

    def _extract_notes(self, payload: dict[str, Any]) -> list[str]:
        notes_value = payload.get("notes")
        if notes_value is None:
            return []
        if isinstance(notes_value, list):
            return [str(item) for item in notes_value]
        return [str(notes_value)]

    def _log_resolution(
        self,
        *,
        raw_text: str,
        normalized: str,
        result: NutritionMatch,
    ) -> None:
        if not self.log_path:
            return

        record = {
            "timestamp_utc": datetime.now(timezone.utc).isoformat(),
            "raw_query": raw_text,
            "normalized_query": normalized,
            "canonical_food_name": result.canonical_food_name,
            "match_type": result.match_type,
            "confidence": result.confidence,
            "source": result.source,
            "review_needed": result.review_needed,
            "used_alias": result.flags.get("used_alias", False),
            "used_semantic_fallback": result.flags.get("used_semantic_fallback", False),
            "candidate_foods": result.candidate_foods,
            "notes": result.notes,
        }

        with self.log_path.open("a", encoding="utf-8") as fh:
            fh.write(json.dumps(record, ensure_ascii=False) + "\n")

    def _to_mapping(self, raw: Any) -> dict[str, Any]:
        if raw is None:
            return {}

        if isinstance(raw, dict):
            return raw

        if is_dataclass(raw):
            return asdict(raw)

        if hasattr(raw, "__dict__"):
            return dict(vars(raw))

        return {"value": raw}

    def _pick_first_str(self, payload: dict[str, Any], *keys: str) -> str | None:
        for key in keys:
            value = payload.get(key)
            if value is not None and str(value).strip():
                return str(value).strip()
        return None

    def _pick_first_float(self, payload: dict[str, Any], *keys: str) -> float | None:
        for key in keys:
            if key in payload:
                converted = self._safe_float_or_none(payload[key])
                if converted is not None:
                    return converted
        return None

    def _safe_float_or_none(self, value: Any) -> float | None:
        if value is None:
            return None
        if isinstance(value, bool):
            return None
        try:
            return float(value)
        except (TypeError, ValueError):
            return None


_default_service: NutritionService | None = None


def get_nutrition_service() -> NutritionService:
    global _default_service
    if _default_service is None:
        _default_service = NutritionService()
    return _default_service


def resolve_nutrition(
    query: NutritionQuery | str,
    *,
    enable_semantic_fallback: bool = True,
    semantic_limit: int = 5,
) -> NutritionMatch:
    """
    Public function to be shared by chatbot, Flask app, scripts, and future APIs.
    """
    service = get_nutrition_service()
    return service.resolve_nutrition(
        query,
        enable_semantic_fallback=enable_semantic_fallback,
        semantic_limit=semantic_limit,
    )

Minimal integration examples

Chatbot / agent code

from nutrition_service import resolve_nutrition

result = resolve_nutrition("100g chicken breast")

if result.canonical_food_name:
    print(result.canonical_food_name, result.per_100g, result.match_type, result.confidence)
else:
    print(result.candidate_foods, result.review_needed)

Approving an alias

from nutrition_service import get_nutrition_service

service = get_nutrition_service()
service.approve_alias(
    alias_text="garbanzo beans",
    canonical_food_name="Chickpeas, mature seeds, cooked",
    notes="Common English variant",
)

Notes

This is designed to be dropped in without replacing your existing NutritionLookup. The adapter is intentionally defensive because I have not inspected the exact return shape of NutritionLookup.lookup(). If you paste that file next, I can wire the adapter to your exact schema and remove the fallback heuristics.

Phase 1 delivered here includes:

  • shared runtime service
  • SQLite alias persistence
  • normalization with quantity stripping
  • stable NutritionMatch payload
  • JSONL observability logging
  • optional Chroma candidate fallback path

The contracts are locked, so the next step is just adapter cleanup:

  • make nutrition_agent.py consume IngredientNutritionResult / RecipeNutritionResult directly
  • remove any old key assumptions like ad hoc matched_food or custom macro dict shapes
  • update any UI/rendering code to read the frozen fields only

Priority order:

  1. multi_agent_chatbot/nutrition_agent.py
  2. any chatbot tool wrappers
  3. any UI/debug panels
  4. old tests that still assume pre-contract shapes

The goal now is boring and good:

all callers read the contract, nobody reaches into internals

 

(venv) [CML] tux@camelot …/camelot/ai-agents-crash-course $ python3 -m pytest tests/test_import_recipes.py multi_agent_chatbot/tests/test_nutrition_lookup.py multi_agent_chatbot/tests/test_nutrition_contracts.py
==================================================== test session starts ====================================================
platform linux -- Python 3.13.11, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/tux/camelot/ai-agents-crash-course
plugins: anyio-4.10.0
collected 17 items                                                                                                          

tests/test_import_recipes.py .............                                                                            [ 76%]
multi_agent_chatbot/tests/test_nutrition_lookup.py ..                                                                 [ 88%]
multi_agent_chatbot/tests/test_nutrition_contracts.py ..                                                              [100%]

==================================================== 17 passed in 0.04s =====================================================
(venv) [CML] tux@camelot …/camelot/ai-agents-crash-course $ 

Excellent — that means the foundation is genuinely stable now.

You have:

  • passing import/ingestion coverage
  • passing lookup behavior tests
  • passing contract tests
  • frozen ingredient and recipe result shapes

So the project has crossed an important line:

nutrition is no longer “just working by accident” — it now has a tested API surface.

The next step is the boring valuable one:
update downstream callers to consume only the locked contracts.

Most likely target:

  • multi_agent_chatbot/nutrition_agent.py

What to do there:

  • import the canonical result types
  • read only query, ingredient, per_100g, signals, source
  • for recipe results, read only totals, per_serving, servings, breakdown, missing, source
  • remove any assumptions about older field names or internal lookup structures

At this point, I would not expand scope. No alias DB, no RAG changes, no recipe parsing overhaul. Just finish the contract adoption pass.


Contract


More than good, actually: it is the bit that stops the nutrition layer from turning back into swamp-magic.

Right now the contract does three useful things:

It tells every caller, “this is the shape you get back, and it is not a surprise.”
That means the agent, Chainlit, tests, and any future UI all stop improvising.

It separates ingredient truth from recipe math.
NutritionLookup answers “what is this food?”
NutritionCalculator answers “what happens when I combine these amounts?”
That is a clean split.

It gives you permission to change internals later without wrecking the surfaces you actually use.
You can swap matching logic, improve CSV handling, tighten signals, whatever — and the consumers should keep working as long as the contract holds.

So in plainer terms:

  • before: “works, but everyone kind of pokes at the blob differently”
  • now: “these are the official doors; stop climbing through windows”

That is why it matters.

What is next is not more contract theory. It is making sure every consumer actually uses those official doors and none of them are secretly rummaging around old fields.

If you want the sharper engineering phrase: consumer migration to the contract.

A few near-neighbors, depending on tone:

  • contract adoption — broad, practical
  • contract enforcement — stronger; implies tests/linting/backstops
  • consumer cleanup — informal, accurate
  • legacy field removal — the destructive part
  • API conformance pass — slightly more formal

The one I’d use in your repo notes is:

Contract adoption pass

because it means:
the contract exists, and now every caller is being moved onto it.

If you want the sharper engineering phrase: consumer migration to the contract.

What is next is not more contract theory. It is making sure every consumer actually uses those official doors and none of them are secretly rummaging around old fields.

Yes. I’d call it a door inventory or contract adoption register.

Based on what you’ve told me so far, the doors look like this:

Door

Purpose

Current status

NutritionLookup.lookup()

ingredient-level nutrition result

Locked

NutritionCalculator.calculate()

recipe/batch nutrition result

Locked

multi_agent_chatbot/tests/test_nutrition_contracts.py

enforces the result shapes

Locked

multi_agent_chatbot/nutrition_agent.py tool-return path

returns contract dicts

Adopted

chatbot/nutrition_agent.py

returns contract dicts

Adopted

chatbot_complete/nutrition_agent.py

returns contract dicts

Adopted

Agent summary/formatting code

reads the contract to produce text

Needs verification

UI/debug panels

render contract fields only

Needs verification

Legacy field access

old keys / old shapes

Needs removal

Chainlit integration

should remain intact, only consume shared contracts

Protected; verify usage only

The fetch phrase I’d use is:

“Find all consumers of the nutrition contracts and classify them as locked, adopted, needs verification, or legacy.”

If you want a grep-style version for repo work, use this idea:

contract adoption pass:
find all call sites of NutritionLookup.lookup and NutritionCalculator.calculate;
for each caller, record:
- returns contract directly
- reads contract correctly
- formats contract correctly
- still touches legacy fields

And the status words I’d standardize are:

  • Locked — source contract is defined and tested
  • Adopted — consumer uses only the contract
  • Needs verification — probably okay, but not confirmed
  • Legacy — still touches old fields or compatibility glue
  • Protected — must not be removed or broken while migrating

So the phrase to ask your tools or Codex is:

“Run a contract adoption pass and produce a door inventory with statuses.”

That’s a good inventory.

I’d label the overall state as:

Contracts locked. Consumers adopted. No runtime legacy detected.

That gives you a crisp milestone name.

The only nuance I’d add is this:

The ingestion scripts are not legacy; they’re out of contract scope.
They can stay weird longer, as long as they do not leak raw shapes into runtime consumers.

So your categories now read like:

  • Locked — schema owners
  • Adopted — runtime consumers
  • Clear — nothing left in needs-verification
  • Out of scope — ingestion/build-time code
  • Protected — Chainlit and active streams must remain untouched

The next milestone after this is:

Contract protection

Meaning:

  • keep the tests
  • stop reintroducing ad hoc fields
  • require new consumers to use the frozen keys
  • keep Chainlit as a consumer, not a refactor casualty

That summary is no longer bland. It says the nutrition layer has gone from “interesting blob” to governed interface.