Strict or Loose: Tuning a Citation-First Answer Prompt

I wrote last week about the two-layer pattern behind my Q&A system for a 640-page authored book: an LLM routes to the right chapters, then a second call quotes them verbatim with page citations. No embeddings, no chunks. The router layer got most of my attention in that post — building the index, verifying it, the precision tax on rich metadata. This post is about the layer after.

During testing I ran a question through the system and got a refusal. The system listed the chapters it had consulted — three of them, reasonable picks — and said the question wasn’t covered. The chapters should have covered it. So before touching any code, I had to answer a different question: was the refusal legitimate?

When an LLM system returns an unexpected result, diagnose which layer before fixing. The bug could be anywhere — router, prompt, temperature, the source itself.

Verifying the refusal

I asked Claude to grep the raw chapter markdown for the concept the user had asked about. Same concept, different vocabulary: the book used X where the question used Y. Grep found the content immediately — in two of the three chapters the router had picked. The refusal was wrong. The router had done its job.

So the bug was in the layer after — the answer call. The system prompt passed to the model along with the loaded chapters decides the shape of the response, including how strict to be about quoting. Mine was very strict.

The original prompt

The relevant part of the answer prompt read:

מבוסס/ת אך ורק על הפרקים שסופקו למטה. אין להוסיף ידע חיצוני.

אם השאלה לא נענית בפרקים — אמרי זאת במפורש ופרטי אילו פרקים נבדקו.

…

אין לשכתב או לפרפרזות את מילותיה של אילנה. אם אין ציטוט ישיר מתאים — אמרי שאין.

Translation: “Base only on the supplied chapters. No external knowledge. If the question isn’t answered in the chapters, say so explicitly. Do not rewrite or paraphrase the author’s words. If there’s no direct matching quote, say there isn’t.”

Temperature was 0.2. The combination of strict rule plus low temperature made the model lexically literal. The user asks about X, the book talks about X under a different name, the model sees no exact quote matching the question’s phrasing, and it follows the rule: say there isn’t.

That rule design conflates two separate safeties. The first — “don’t hallucinate facts, don’t invent quotes” — is mandatory. The second — “don’t paraphrase the author’s voice inside a quote” — is a style preference. The original prompt enforced both with one rule. Collapsing them meant any vocabulary gap between user and source got filed as “no quote exists, refuse.”

The slider

Answer prompts in citation-first systems sit on a slider. Strict side: the model refuses on vocabulary gaps. Loose side: the model bridges so freely it invents claims the source doesn’t support. Both failures look confident from the outside. The user can’t tell from a single response which side of the slider they hit.

Strict-side failure costs trust: users who know the source asked a reasonable question and got told the source doesn’t cover it. Loose-side failure costs correctness: the model produces something plausible and the user treats it as the author’s words when it isn’t. For a system whose whole value is direct attribution, loose is worse than strict. That’s why the original prompt erred strict. But the strict setting was catching legitimate questions too — not just edge cases.

Tuning the slider is finding where the model can bridge vocabulary without fabricating facts.

Three iterations

My first draft of the fix was a rewrite. A new section naming the book’s framework explicitly, restructured rule list, anti-hallucination guardrails organized into their own block, examples reworked. Roughly sixty lines of prompt where the original had thirty. Too much surface area for a one-rule bug, and it baked the source’s worldview into the system prompt as a claim — a fragile dependency on content that might nuance over time.

Second draft kept the structural reorganization but stripped the worldview section. Still too much. The original prompt was producing good answers for questions where vocabulary matched — and that was the majority of questions. A rewrite risked regressing those cases to fix the minority.

What landed was a three-edit diff on top of the original prompt. Seven insertions, one deletion. Every original rule and every original example preserved.

The three edits

One. A new instruction, added right after “no external knowledge”:

חשבי על המשמעות של השאלה, לא רק על המילים המדויקות. אם הפרקים דנים במושג או בתופעה שעליהם שאלו — גם במילים אחרות — הביאי את הציטוטים הרלוונטיים.

“Think about the meaning of the question, not just the exact words. If the chapters discuss the concept or phenomenon asked about — even in other words — bring the relevant quotes.”

This is the core shift. One sentence tells the model that vocabulary overlap is not the matching criterion. Meaning is.

Two. The refusal rule softened from a terse assertion to a template:

אם לאחר קריאה מושגית של הפרקים לא נמצאה תשובה, השתמשי בתבנית: “לא מצאתי בפרקים שנבדקו (שמות הפרקים) תשובה לשאלה. אפשר לנסות לנסח אותה סביב סימפטום או תופעה ספציפית.”

“If no answer is found after a conceptual reading of the chapters, use the template: ‘I didn’t find an answer to the question in the examined chapters (chapter names). You can try rephrasing around a specific symptom or phenomenon.’”

The change matters in two ways. The trigger shifts from “no lexical match” to “no conceptual match.” And the refusal itself becomes cooperative — it suggests the user reframe in the source’s vocabulary rather than just saying the content doesn’t exist.

Three. A third example, added after the two existing question-type examples:

דוגמה לשאלה בשפת המשתמש/ת:

המילה “להתמודד” לא מופיעה בטקסט, אך הספר דן בכעס כדפוס רגשי שיש לו שורש נשמתי.

Examples teach a model faster than rules. This one shows the exact shape I wanted: a question using vocabulary that isn’t in the text, an opening sentence that bridges vocabulary (“the text doesn’t use this term, but discusses it as X”), then verbatim quotes with page citations.

What didn’t change

The substantive content rules stayed. No external knowledge. No paraphrasing inside quotes. No inventing page numbers. The bridge sentence is explicitly framed as a framing sentence, not a paraphrase — it introduces what the quotes will show, and it can use the user’s vocabulary, but it can’t make claims the quotes don’t support.

That distinction is the load-bearing piece of the tuning. The bridge sentence is permission to use non-source vocabulary in exactly one place, for exactly one job: translating the user’s question into the answer that follows. Every factual sentence after the bridge is a quote.

Verification

All existing unit tests stayed green — the prompt still produces Hebrew, still uses > for blockquotes, still renders עמוד N citations, still forbids paraphrasing. I re-ran the original failing question against the deployed revision. The response now opens with a bridging sentence and includes five direct quotes from three pages across two chapters. Exactly the shape the third example teaches.

When this design wouldn’t apply

The bridge-sentence pattern assumes the source material and the user’s question are conceptually compatible even when lexically different. That’s true for authored prose where one author discusses topics across a book — the user asks in their own vocabulary, the author wrote about the same concept under a different name, same meaning.

It isn’t true everywhere. Medical information where terminology is load-bearing — drug names, dosages, diagnosis codes — should stay lexically strict. Legal citation work where a statute’s exact wording matters. Regulatory compliance. Any domain where “same concept, different words” is actually a wrong answer. For those systems, the original strictness is the right design, and the bridge sentence is what you don’t want.

The broader shape

Citation-first Q&A over authored prose has two failure surfaces at the answer layer, and they’re on opposite sides of a slider. Strict: refuse when vocabulary doesn’t match. Loose: bridge concepts so freely you fabricate. Neither failure is visible until the wrong question hits. Both feel confident.

Finding the balance is mostly about being explicit — tell the model to reason about meaning, but keep every factual sentence as a verbatim quote. The bridge is the opening, not the content.

A rewrite is how you fix an architecture mistake. A three-line insertion is how you fix a rule design.