Skip to main content
Eckford on the side
The snippets herein are from far and wide. It makes no sense without context
  • Login
  • Home
  • AI
    • ML Engineering
    • Air-gapped monitoring (ELK lite)
    • vector embedding is a numerical representation of text (that preserves meaning)
    • all-MiniLM-L6-v2 ≈ small (~100MB)
    • Watchdog Agents at API Gateways
    • Deploy to render
    • OpenAI AgentKit
    • SPEC-1-Nutrition-Intelligence-Runtime
    • Proposed Execution Plan (Phase 1)
    • AI - nutrition knowledge and assistance platform
  • Bridges
  • Flask
  • MIDI Sites March 2026
  • 📕 The Spellbook of Merlin
  • The Foundation of Camelot
  • 🗺️ Network Layout Overview and Tests
  • The enterprise AI landscape - AgentOps
  • Agentic AI
  • GitOps
  • Patterns for safely evolving systems without breaking operations
  • Get Smart — Project RULES
  • JTP - alignment attempt (2026-04-21)
  • Primitive Detection engine
Menu
  1. Home

CSV to JSONL

Thu, 03/26/2026 - 10:22pm by admin

It's still out there! You landed in CSV because "something-somewhere" still speaks CSV, not because your system wanted it.

And now you’ve quietly corrected course:

👉 CSV = extraction format
👉 JSONL = working format

That’s the right architecture.

Why you’re loving JSONL (and should)

“One line per row” is doing a lot of heavy lifting:

  • each record is independent
  • no column alignment drama
  • no quoting gymnastics
  • no guessing where a field ends
  • easy to stream, filter, debug

You can literally:

grep '"classification": "recipe_partial"' rag_setup/recipes_normalized.jsonl

and it just works. No parser needed. That’s power.

The hidden win

You’ve separated concerns cleanly:

Backdrop (Views) → CSV  ← selected by choice
CSV → JSONL             ← your normalization step
JSONL → app             ← your actual system

So now:

  • if "something-somewhere"  changes → only CSV layer is affected
  • your app stays stable
  • your pipeline stays deterministic

This is the real upgrade

You didn’t just fix the data.

You created a pipeline boundary.

And that’s why this feels better:

👉 You’re no longer “working with export data”
👉 You’re working with your own normalized dataset

One-line truth

CSV got the data out.
JSONL makes it usable.

And yeah… one-line-per-object is ridiculously satisfying once you’ve fought CSV long enough 😄

 


Yes, JSONL. And yes, there is a plan behind the pick.

How to access it

A .jsonl file is:

  • one JSON object per line
  • easy to stream
  • easy to debug
  • easy to grep
  • easy to append/rebuild

Your file is here:

rag_setup/recipes_normalized.jsonl

Quick ways to look at it

Show first few lines

head -n 5 rag_setup/recipes_normalized.jsonl

Pretty-print first few records

python - <<'PY'
import json

with open("rag_setup/recipes_normalized.jsonl", encoding="utf-8") as f:
    for i, line in enumerate(f):
        obj = json.loads(line)
        print("\n---")
        print(json.dumps(obj, indent=2, ensure_ascii=False))
        if i == 2:
            break
PY

Read it in Python

import json

rows = []
with open("rag_setup/recipes_normalized.jsonl", encoding="utf-8") as f:
    for line in f:
        rows.append(json.loads(line))

print(rows[0]["title"])

Filter recipe-like rows

python - <<'PY'
import json

with open("rag_setup/recipes_normalized.jsonl", encoding="utf-8") as f:
    for line in f:
        row = json.loads(line)
        if row.get("classification") in {"recipe_full", "recipe_partial"}:
            print(row["title"])
PY

Why JSONL was a good pick

Because it sits nicely between raw CSV and a “real database.”

CSV is good for:

  • export
  • transport
  • simple tabular fields

JSONL is better for:

  • normalized records
  • nested fields later
  • incremental pipeline stages
  • line-by-line processing
  • keeping each row as a standalone object

That matters because your data is already outgrowing plain flat CSV thinking.

For example, later you may want each row to hold things like:

{
  "title": "Marinated Beans",
  "classification": "recipe_partial",
  "servings": 2,
  "ingredients_raw": [
    "1 small shallot, finely chopped",
    "2-4 tbsp white wine vinegar"
  ]
}

That is natural in JSONL and awkward in CSV.

The real plan behind it

Think of it like this:

Backdrop CSV
→ normalize/clean/classify
→ JSONL
→ build richer base_recipe objects
→ feed app/tests/effective_recipe

So JSONL is the staging format.

Not the final destination, not the source of truth, but a very handy workbench.

One-line answer

You access it by reading it one line at a time as JSON, and the reason it was chosen is that it is much better than CSV for pipeline-friendly, semi-structured recipe records.

  • Log in to post comments

Hmmmm

  • cut the cord → go async
  • CSV to JSONL
  • GNOME’s screenshot accelerator
  • 🧰 1. Modern Tools to Know
  • Frodo's USB NIC
  • podman make files
  • Linux 101
  • Access Points on same network
  • catch‑all stole my traffic
  • An IP address is like a building’s street address
Powered by Backdrop CMS