Manual Testing Guide: Feature 014 - LangChain Migration¶
This document provides comprehensive manual test cases for validating the LangChain migration. Run these tests using the Mobile App and Admin Specialist Test screen.
Prerequisites¶
1. Database Setup¶
Run the stress test seeding script to create test data:
cd packages/backend
# Option 1: Light test (10k documents, 100k chunks) - Recommended for initial testing
pnpm ts-node src/scripts/seed-stress-test-data.ts --documents=10000 --chunk-sampling=100
# Option 2: Medium test (100k documents, 100k chunks) - Good for performance testing
pnpm ts-node src/scripts/seed-stress-test-data.ts --documents=100000 --chunk-sampling=10
# Option 3: Full scale test (1M documents, 3M chunks) - Production simulation
# Requires ~20GB storage for embeddings, takes 30-60 minutes to seed
pnpm ts-node src/scripts/seed-stress-test-data.ts --documents=1000000 --chunk-sampling=100 --chunks-per-doc=3 --vector-stores=10 --specialists=20
# Clean up when done
pnpm ts-node src/scripts/seed-stress-test-data.ts --clean
2. Verify Seed Data¶
After seeding, verify the data was created:
-- Check specialists
SELECT name, "organizationId" FROM specialists WHERE "organizationId" = 'stress-test-org';
-- Check vector stores and chunk counts
SELECT vs.name, COUNT(fc.id) as chunk_count
FROM vector_store vs
LEFT JOIN file_metadata fm ON fm.vector_store_id = vs.id
LEFT JOIN file_chunks fc ON fc.file_id = fm.id
WHERE vs.organization_id = 'stress-test-org'
GROUP BY vs.id, vs.name;
-- Check MCP plugins
SELECT name, "displayName", "pluginType" FROM mcp_plugins WHERE name LIKE '%-stress';
3. Start Services¶
# Terminal 1: Start backend
cd packages/backend && pnpm run dev
# Terminal 2: Start admin
cd packages/admin && pnpm run dev
# Terminal 3 (optional): Start mobile
cd packages/mobile-app && pnpm run start
Test Categories¶
1. Telemetry Tests¶
Test 1.1: Basic Telemetry Capture¶
Objective: Verify telemetry is captured for simple queries
Steps:
- Open Admin Dashboard
- Navigate to a specialist test page
- Send a simple query: "What is the refund policy?"
- Check the Telemetry tab/section
Expected Results:
- [ ] Trace ID is generated and displayed
- [ ] TRIGGER node shows the original query
- [ ] SPECIALIST node shows which specialist was selected
- [ ] Query timestamp is recorded
- [ ] Response time is captured
Verification Query:
SELECT * FROM routing_telemetry
WHERE "organizationId" = 'stress-test-org'
ORDER BY "createdAt" DESC LIMIT 5;
Test 1.2: Telemetry with Tool Execution¶
Objective: Verify telemetry captures tool calls
Steps:
- Navigate to Technical Support specialist (has tools)
- Send query: "What is 25 * 47?"
- Check telemetry trace
Expected Results:
- [ ] ACTION node appears for tool call
- [ ] Tool name (calculate) is recorded
- [ ] Tool input parameters are logged
- [ ] Tool output is captured
- [ ] Execution duration is measured
Test 1.3: Telemetry Under Load¶
Objective: Verify telemetry performance with many concurrent requests
Steps:
- Open 3-5 browser tabs with specialist test pages
- Send queries simultaneously from all tabs
- Check each trace is independent
Expected Results:
- [ ] Each request gets unique trace ID
- [ ] No cross-contamination between traces
- [ ] Latency remains under 2 seconds
- [ ] All traces are queryable
Test 1.4: Telemetry Data Sanitization¶
Objective: Verify sensitive data is not stored in telemetry
Steps:
- Send query containing email: "My email is test@example.com, what's my balance?"
- Check telemetry data
Expected Results:
- [ ] Email addresses are redacted or masked
- [ ] No PII stored in plain text
- [ ] Query is truncated if too long (>500 chars)
2. Specialist Handover Tests¶
Test 2.1: Simple Handover¶
Objective: Verify handover works when query matches different specialist
Steps:
- Start conversation with Billing Specialist
- Send: "I need help with billing" (should stay with billing)
- Then send: "Actually, I have a technical error with the API"
- Observe handover to Technical Support
Expected Results:
- [ ] Handover message displayed in user's language
- [ ] Technical Support specialist now active
- [ ] Original conversation context preserved
- [ ] New specialist responds appropriately
Test 2.2: Explicit Handover Request¶
Objective: Verify LLM correctly uses handover tool
Steps:
- Chat with Technical Support
- Send: "I don't need technical help anymore, I want to talk about my invoice"
- Observe handover
Expected Results:
- [ ] LLM calls
handover_to_specialisttool - [ ] Handover reason is correctly extracted
- [ ] Billing Specialist selected
- [ ] Handover message localized correctly (EN/PT-BR)
Test 2.3: Handover Chain Prevention¶
Objective: Verify circular handovers are prevented
Steps:
- Start with Billing Specialist
- Ask billing questions
- Request handover to Technical Support
- Immediately ask: "I want to go back to billing"
- Observe behavior
Expected Results:
- [ ] System prevents immediate bounce-back
- [ ] Previous specialists are tracked
- [ ] User can still access billing after some interaction
- [ ] No infinite loop occurs
Test 2.4: Handover with No Suitable Specialist¶
Objective: Verify graceful handling when no specialist matches
Steps:
- Start conversation
- Ask about something completely unrelated: "Tell me about quantum physics"
- Observe response
Expected Results:
- [ ] Handover fails gracefully
- [ ] General assistant provides response
- [ ] Error message displayed (localized)
- [ ] Conversation continues normally
Test 2.5: Handover Telemetry¶
Objective: Verify handover events are captured in telemetry
Steps:
- Perform a successful handover (Test 2.1)
- Check telemetry data
Expected Results:
- [ ] Handover action recorded in trace
- [ ] Source specialist ID logged
- [ ] Target specialist ID logged
- [ ] Handover reason captured
- [ ]
handoverOccurred: truein routing_telemetry
3. Vector Store Query Tests¶
Test 3.1: Basic Semantic Search¶
Objective: Verify vector store queries return relevant results
Steps:
- Navigate to Billing Specialist (linked to Billing Documentation)
- Ask: "How do I get a refund?"
- Check response includes knowledge base content
Expected Results:
- [ ] Response mentions 30-day refund policy
- [ ] Content from Billing Documentation used
- [ ] Semantic similarity score > 0.7
- [ ] Response is coherent and relevant
Test 3.2: Multi-Vector Store Query¶
Objective: Verify queries across multiple vector stores
Steps:
- Use a specialist linked to multiple knowledge bases
- Ask a question spanning multiple topics
- Check responses include content from both
Expected Results:
- [ ] Results from multiple vector stores
- [ ] Re-ranking applied correctly
- [ ] Most relevant chunks prioritized
- [ ] No duplicate content
Test 3.3: Query Performance Under Load¶
Objective: Verify vector queries perform well with large datasets
Prerequisite: Run with --documents=100000 or higher
Steps:
- Measure query time for simple query
- Send 10 sequential queries
- Record average response time
Expected Results:
- [ ] Query latency < 500ms (embedding generation)
- [ ] Vector search < 100ms (PostgreSQL pgvector)
- [ ] Total response time < 3 seconds
- [ ] No timeout errors
Test 3.4: Min Score Filtering¶
Objective: Verify low-relevance results are filtered
Steps:
- Ask a question unrelated to any knowledge base
- Example: "What's the weather like today?"
- Check that irrelevant chunks are not included
Expected Results:
- [ ] No chunks returned with score < 0.7
- [ ] Response generated without hallucinating content
- [ ] System handles no-match gracefully
Test 3.5: Re-Ranking Algorithm¶
Objective: Verify multi-factor re-ranking works correctly
Steps:
- Ask question with multiple relevant chunks
- Check ordering of results
- Verify recent/early chunks ranked appropriately
Expected Results:
- [ ] Semantic score weighted 65%
- [ ] Vector store order weighted 15%
- [ ] Chunk position weighted 10%
- [ ] Recency weighted 10%
- [ ] Final ranking is sensible
4. Tool Call Tests¶
Test 4.1: REST API Tool - GET Request¶
Objective: Verify REST GET tool execution
Steps:
- Navigate to Technical Support (has Weather API)
- Ask: "What's the weather in London?"
- Observe tool execution
Expected Results:
- [ ] Tool
get_current_weathercalled - [ ] GET request sent to API
- [ ] Response parsed correctly
- [ ] Weather info displayed to user
Test 4.2: REST API Tool - POST Request¶
Objective: Verify REST POST tool execution
Steps:
- Navigate to specialist with Task Manager plugin
- Ask: "Create a task called 'Test task from LangChain'"
- Observe tool execution
Expected Results:
- [ ] Tool
create_taskcalled - [ ] POST request sent with correct body
- [ ] Response confirms creation
- [ ] Task info returned to user
Test 4.3: Builtin Tool Execution¶
Objective: Verify builtin (stdio) tools work
Steps:
- Navigate to specialist with Calculator plugin
- Ask: "Calculate 123 * 456 + 789"
- Observe tool execution
Expected Results:
- [ ] Tool
calculatecalled - [ ] Expression parsed correctly
- [ ] Result: 56877 returned
- [ ] User sees the calculation result
Test 4.4: Multiple Tool Calls in Sequence¶
Objective: Verify multiple tools can be called in one conversation
Steps:
- Ask: "What's 25 * 4?"
- Wait for response
- Ask: "Now multiply that result by 2"
- Observe both tool calls
Expected Results:
- [ ] First tool returns 100
- [ ] Second tool returns 200
- [ ] Context maintained between calls
- [ ] Both results displayed correctly
Test 4.5: Tool Error Handling¶
Objective: Verify tool errors are handled gracefully
Steps:
- Ask: "Get weather for 'invalid_city_xyz_123'"
- Observe error handling
Expected Results:
- [ ] API error captured
- [ ] Error message in ToolMessage
- [ ] LLM provides helpful response
- [ ] Conversation continues normally
Test 4.6: Tool Iteration Limit¶
Objective: Verify tool loop doesn't run infinitely
Steps:
- Craft a query that might cause repeated tool calls
- Example: "Keep calculating until you reach infinity"
- Observe iteration limit behavior
Expected Results:
- [ ] Maximum 10 iterations (default)
- [ ]
iterationLimitReached: truein response - [ ] User receives partial response
- [ ] No infinite loop
Test 4.7: OAuth Required Tool¶
Objective: Verify OAuth flow is triggered when needed
Prerequisite: Have a plugin requiring OAuth configured
Steps:
- Try to use a tool requiring OAuth
- Observe OAuth prompt
Expected Results:
- [ ]
oauthRequiredevent triggered - [ ] Provider info returned
- [ ] Auth URL provided
- [ ] Required scopes listed
Test 4.8: Tool Telemetry¶
Objective: Verify tool executions appear in telemetry
Steps:
- Execute any tool (Test 4.1)
- Check telemetry trace
Expected Results:
- [ ] ACTION node for tool execution
- [ ] Tool name recorded
- [ ] Input parameters logged
- [ ] Output captured
- [ ] Duration measured
Performance Benchmarks¶
Baseline Metrics (with 100k chunks)¶
| Operation | Expected | Acceptable | Critical |
|---|---|---|---|
| Simple query (no tools) | < 1s | < 2s | > 5s |
| Query with vector search | < 2s | < 3s | > 6s |
| Single tool call | < 3s | < 5s | > 10s |
| Multiple tool calls (3) | < 8s | < 12s | > 20s |
| Handover | < 2s | < 4s | > 8s |
| Telemetry write | < 50ms | < 100ms | > 500ms |
Load Testing Checklist¶
- [ ] 10 concurrent users: Response times < 3s
- [ ] 50 concurrent users: Response times < 5s
- [ ] 100 concurrent users: Response times < 10s
- [ ] Memory usage stable over 30 minutes
- [ ] No connection pool exhaustion
- [ ] Telemetry data complete for all requests
Test Execution Log¶
| Test ID | Date | Tester | Result | Notes |
|---|---|---|---|---|
| 1.1 | [ ] Pass / [ ] Fail | |||
| 1.2 | [ ] Pass / [ ] Fail | |||
| 1.3 | [ ] Pass / [ ] Fail | |||
| 1.4 | [ ] Pass / [ ] Fail | |||
| 2.1 | [ ] Pass / [ ] Fail | |||
| 2.2 | [ ] Pass / [ ] Fail | |||
| 2.3 | [ ] Pass / [ ] Fail | |||
| 2.4 | [ ] Pass / [ ] Fail | |||
| 2.5 | [ ] Pass / [ ] Fail | |||
| 3.1 | [ ] Pass / [ ] Fail | |||
| 3.2 | [ ] Pass / [ ] Fail | |||
| 3.3 | [ ] Pass / [ ] Fail | |||
| 3.4 | [ ] Pass / [ ] Fail | |||
| 3.5 | [ ] Pass / [ ] Fail | |||
| 4.1 | [ ] Pass / [ ] Fail | |||
| 4.2 | [ ] Pass / [ ] Fail | |||
| 4.3 | [ ] Pass / [ ] Fail | |||
| 4.4 | [ ] Pass / [ ] Fail | |||
| 4.5 | [ ] Pass / [ ] Fail | |||
| 4.6 | [ ] Pass / [ ] Fail | |||
| 4.7 | [ ] Pass / [ ] Fail | |||
| 4.8 | [ ] Pass / [ ] Fail |
Troubleshooting¶
Common Issues¶
- "No enabled LLM integration found"
- Run:
pnpm ts-node src/scripts/seed-llm.ts -
Enable an LLM integration in admin
-
"Vector store not found"
- Check organization ID matches
-
Verify seed script completed successfully
-
"Tool not found"
- Check plugin is installed for organization
-
Verify specialist has plugin associated
-
Slow query times
- Check PostgreSQL pgvector indexes
- Monitor connection pool usage
-
Check embedding service availability
-
Handover not triggering
- Verify specialists have distinct routing metadata
- Check routing embeddings are generated
- Review router service logs
Useful Debug Commands¶
# Check backend logs
tail -f packages/backend/logs/*.log
# Monitor PostgreSQL queries
SELECT query, calls, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;
# Check vector store indexes
SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass))
FROM pg_indexes WHERE tablename = 'file_chunks';
Sign-off¶
| Role | Name | Date | Signature |
|---|---|---|---|
| Developer | |||
| QA | |||
| Product Owner |