Error Detail - NSR002 - Clean Language

Latest Report Context

Severityregression

Categoryunknown

Error Message

PARTIAL FIX of report 34a3e2c7-b712-4b4a-9515-dca9650bc9da: clean-node-server 0.1.59 stops the WASM heap exhaustion crash, but the underlying UTF-8 multibyte handling bug is still present — responses contain U+FFFD replacement characters and rendering silently truncates after partial content.

Minimal Reproduction

// Same minimal repro as #34a3e2c7 — but the regression is that 0.1.59 now
// returns 200 with corrupted body instead of crashing.

plugins:
    frame.server
    frame.data

endpoints server:
    GET "/repro" :
        // Any string from a db.query() result that contains a 3-byte UTF-8 char
        string sql = "SELECT 'Tutorials — Learn Clean Language' as title"
        string result = db.query(sql, "[]")
        string title = json.get(result, "data.rows.0.title")
        return http.respond(200, "text/plain", title)

// Expected response body: Tutorials — Learn Clean Language
// Actual response body:   Tutorials ��� Learn Clean Language
// (each em-dash byte E2 80 94 becomes 3 separate U+FFFD chars EF BF BD)

// To reproduce the silent truncation, add a loop that walks json.get for
// many items where any value contains a 3-byte UTF-8 char — rendering stops
// after ~10 items even though the loop should iterate 30 times.

Expected BehaviorFix the root cause in the JSON bridge: every length calculation that crosses the Node/WASM boundary must use UTF-8 byte length (Buffer.byteLength(str, 'utf8')), never str.length (UTF-16 code units). When the fix is correct, the same WASM + same DB should produce: - 30 tutorial cards (3 track sections, 10 cards each) - Em-dashes (`—`), arrows (`→`), smart quotes (`""`), and ñ in "Español" rendered correctly as their actual code points - 0 U+FFFD characters - Response body ~50KB A useful self-test the team can add: load any JSON containing every BMP 3-byte UTF-8 sequence (em-dash U+2014, arrow U+2192, ellipsis U+2026, smart quotes U+201C/U+201D), pass it through json.get → string concat → http.respond, and verify zero U+FFFD in the output.

Actual BehaviorAfter upgrading clean-node-server from 0.1.55 to 0.1.59 to pick up the marked-resolved fix for report #34a3e2c7 (heap exhaustion / WASM crash on /tutorials): **Setup (identical to the original repro):** - Same WASM (MD5 14b8276e28c320842e4cb32e579ba84c, compiled with cln 0.30.316 + frame.server 2.7.9) - Same production database (clean_website, 30 tutorial rows with em-dashes and other multibyte UTF-8) - Tested via SSH tunnel from local clean-node-server 0.1.59 to prod MySQL on the droplet **Result:** - ✅ HTTP 200 (no more 500) - ✅ Response in 1.7-2.6s (no longer OOM-kills the worker) - ✅ Worker RSS stable at ~260MB across multiple requests - ❌ Response body is **12,798 bytes** (full /tutorials page should be ~50KB) - ❌ Only **10 of 30** tutorial cards rendered (only the "web-app" track at sort_order 11–20) - ❌ The "first-steps" track (1–10) and "ai-coding" track (21–30) are silently missing - ❌ Only **1 of 3** `

` blocks present - ❌ …

AI Analysis

Suggested Componentsrc/bridge/json.ts

AnalysisDiscovered during: Verifying the resolution of #34a3e2c7 against the production WASM and production database, after the dashboard marked it resolved. The 0.1.59 fix is a workaround at the heap-allocation layer (per the description of the related #76d3b529 fix, "Pre-grow WASM memory to 16 MB at startup so memory.grow() is never called mid-request"). This stops the runaway allocation from triggering DataView detachment, but it doesn't fix the underlying string-length miscalculation that was driving the runaway allocation in the first place. What's still broken: somewhere in the JSON bridge between Node's UTF-16 String and the WASM linear memory's UTF-8 buffer, the byte length and the character length are being confused. Most likely site: in writeLengthPrefixedString or its read counterpart, the cursor advances by str.length (UTF-16 code units) but the buffer was written using Buffer.byteLength(str, 'utf8'), or vice versa. When the cursor lands inside a multibyte sequence: - Reading: the bytes are not a valid UTF-8 sequence → decoder emits U+FFFD per orphan byte - Writing: the length-prefix says N but the actual bytes occupy N+k → reader walks off the end of one value into the next, eventually terminating the iteration early when it lands on a NUL or runs out of data The "10 out of 30 cards rendered" symptom is consistent with this: after enough multibyte sequences corrupt the bridge state, json.get for subsequent items returns empty strings, and the `while title != ""` loop in the renderer exits early — which is EXACTLY what you'd see if the cursor desyncs. The fact that the only 10 cards that render are from the middle track (sort_order 11-20, the "web-app" track) is telling: those are the items that previously had the most corrupted multibyte content (we saw earlier that this track had `→` arrows + em-dashes). The desync state may have aligned in a way that those particular items happen to render past the desync point. The first-steps track (1-10) and ai-coding track (21-30) are LOST. Recommended fix: audit every length calculation in src/bridge/*.{ts,js} and add a unit test corpus containing the full set of common multibyte chars (em-dash U+2014, en-dash U+2013, right arrow U+2192, ellipsis U+2026, smart quotes U+2018-U+201D, ñ U+00F1, é U+00E9, € U+20AC, 你 U+4F60, 🦀 U+1F980) passed through every supported bridge path.

Suggested FixThe pre-grow-to-16MB fix shipped in #76d3b529 (probably the same fix referenced in 0.1.59) addresses the heap-detachment symptom but not the UTF-8 length mismatch root cause. To finish the fix: 1. In src/bridge/*.{ts,js}, find every site that converts between a JS String and a WASM-memory byte buffer. Verify each side uses the same length semantic. 2. Wherever a length is WRITTEN to WASM memory as a length-prefix or used to advance a cursor: use `Buffer.byteLength(str, 'utf8')` — NEVER `str.length` (which counts UTF-16 code units). 3. Wherever a length is READ from WASM memory: trust it as a byte count and slice the buffer accordingly; decode using `.toString('utf8', start, start + byteLen)`. 4. Add a regression test fixture: a JSON object whose values are every common BMP 3-byte UTF-8 char. Pass it through json.get on every supported path depth (0, 1, 2, 3). Assert the output bytes equal the input bytes for each. The visible signature of the bug being properly fixed: 0 U+FFFD in any response body, AND the response body byte length matches what the WASM logically emitted (no silent truncation).

Resolve This Error

API Key

Fixed In Version

Fix Commit

Fix PR

Resolved By

Fix Description

Recent Reports (last 20)

Report ID	Compiler	OS	Arch	Severity	Reported At
773a4358-ddc4-47...	0.30.321	macos	aarch64	regression	2026-06-19 13:50:44

NSR002 open