It's still out there! You landed in CSV because "something-somewhere" still speaks CSV, not because your system wanted it.
And now you’ve quietly corrected course:
👉 CSV = extraction format
👉 JSONL = working format
That’s the right architecture.
Why you’re loving JSONL (and should)
“One line per row” is doing a lot of heavy lifting:
- each record is independent
- no column alignment drama
- no quoting gymnastics
- no guessing where a field ends
- easy to stream, filter, debug
You can literally:
grep '"classification": "recipe_partial"' rag_setup/recipes_normalized.jsonland it just works. No parser needed. That’s power.
The hidden win
You’ve separated concerns cleanly:
Backdrop (Views) → CSV ← selected by choice
CSV → JSONL ← your normalization step
JSONL → app ← your actual systemSo now:
- if "something-somewhere" changes → only CSV layer is affected
- your app stays stable
- your pipeline stays deterministic
This is the real upgrade
You didn’t just fix the data.
You created a pipeline boundary.
And that’s why this feels better:
👉 You’re no longer “working with export data”
👉 You’re working with your own normalized dataset
One-line truth
CSV got the data out.
JSONL makes it usable.
And yeah… one-line-per-object is ridiculously satisfying once you’ve fought CSV long enough 😄
Yes, JSONL. And yes, there is a plan behind the pick.
How to access it
A .jsonl file is:
- one JSON object per line
- easy to stream
- easy to debug
- easy to grep
- easy to append/rebuild
Your file is here:
rag_setup/recipes_normalized.jsonlQuick ways to look at it
Show first few lines
head -n 5 rag_setup/recipes_normalized.jsonlPretty-print first few records
python - <<'PY'
import json
with open("rag_setup/recipes_normalized.jsonl", encoding="utf-8") as f:
for i, line in enumerate(f):
obj = json.loads(line)
print("\n---")
print(json.dumps(obj, indent=2, ensure_ascii=False))
if i == 2:
break
PY
Read it in Python
import json
rows = []
with open("rag_setup/recipes_normalized.jsonl", encoding="utf-8") as f:
for line in f:
rows.append(json.loads(line))
print(rows[0]["title"])
Filter recipe-like rows
python - <<'PY'
import json
with open("rag_setup/recipes_normalized.jsonl", encoding="utf-8") as f:
for line in f:
row = json.loads(line)
if row.get("classification") in {"recipe_full", "recipe_partial"}:
print(row["title"])
PY
Why JSONL was a good pick
Because it sits nicely between raw CSV and a “real database.”
CSV is good for:
- export
- transport
- simple tabular fields
JSONL is better for:
- normalized records
- nested fields later
- incremental pipeline stages
- line-by-line processing
- keeping each row as a standalone object
That matters because your data is already outgrowing plain flat CSV thinking.
For example, later you may want each row to hold things like:
{
"title": "Marinated Beans",
"classification": "recipe_partial",
"servings": 2,
"ingredients_raw": [
"1 small shallot, finely chopped",
"2-4 tbsp white wine vinegar"
]
}That is natural in JSONL and awkward in CSV.
The real plan behind it
Think of it like this:
Backdrop CSV
→ normalize/clean/classify
→ JSONL
→ build richer base_recipe objects
→ feed app/tests/effective_recipeSo JSONL is the staging format.
Not the final destination, not the source of truth, but a very handy workbench.
One-line answer
You access it by reading it one line at a time as JSON, and the reason it was chosen is that it is much better than CSV for pipeline-friendly, semi-structured recipe records.
- Log in to post comments