Unit Test LLM Outputs While You Design.
Automatically validate prompts against test cases, track versions, and deploy with confidence. Version-controlled testing baked into your prompt studio.
Test every prompt against real data; compare versions; integrate tested prompts into your code.
Problem Statement
When you text an LLM, or complete in Cursor, its every output is under your direct supervision. Even a very high rate of failure is acceptable as you steward a task to completion.
However, when you use an LLM in an automated process, every inference runs unsupervised. Its stochastic madness begins to resemble a manufacturing process; your prompt and model together have a _yield_ &em; that is, some rate of viable output.
Tuning vs. Shared Hosting
Pre-trained LLMs can of course be tuned, but there is a tradeoff. Off-the-shelf LLMs, like the default ChatGPT, benefit from shared hosting; a loaded instance of the model can serve many clients, and an inference provider can charge according to usage.
Conversely, your tuned model must remain resident in memory at all times. Either you pay up front, or you pay its depreciation by the hour, but that GPU must be paid for.
So, if you can get away with it, there is advantage to an off-the-shelf model.
Automated Testing
The exact same corpus you can use to fine-tune a model can be used for automated testing of shared models!
There is one caveat, which complicates conventional unit testing; many LLM outputs must be judgedqualitatively. To do this in an automated process requires the use of a second prompt, generally with a stronger model than your target.
Guidance Lab Studio
Enter your test samples side by side; see results live with each change to your prompt.
Each update to your prompt is graded and versioned.
Guidance Lab does not run an API. You will not add it as a technical dependency. Simply export your prompts into your client library of choice.
LFG
Guidance Lab Studio is a work in progress. If you feel Guidance Lab Studio might work for you, you have my full attention! Write me or join the mailing list for updates: