Manual Testing Guide: Feature 014 - LangChain Migration¶

This document provides comprehensive manual test cases for validating the LangChain migration. Run these tests using the Mobile App and Admin Specialist Test screen.

Prerequisites¶

1. Database Setup¶

Run the stress test seeding script to create test data:

cd packages/backend

# Option 1: Light test (10k documents, 100k chunks) - Recommended for initial testing
pnpm ts-node src/scripts/seed-stress-test-data.ts --documents=10000 --chunk-sampling=100

# Option 2: Medium test (100k documents, 100k chunks) - Good for performance testing
pnpm ts-node src/scripts/seed-stress-test-data.ts --documents=100000 --chunk-sampling=10

# Option 3: Full scale test (1M documents, 3M chunks) - Production simulation
# Requires ~20GB storage for embeddings, takes 30-60 minutes to seed
pnpm ts-node src/scripts/seed-stress-test-data.ts --documents=1000000 --chunk-sampling=100 --chunks-per-doc=3 --vector-stores=10 --specialists=20

# Clean up when done
pnpm ts-node src/scripts/seed-stress-test-data.ts --clean

2. Verify Seed Data¶

After seeding, verify the data was created:

-- Check specialists
SELECT name, "organizationId" FROM specialists WHERE "organizationId" = 'stress-test-org';

-- Check vector stores and chunk counts
SELECT vs.name, COUNT(fc.id) as chunk_count
FROM vector_store vs
LEFT JOIN file_metadata fm ON fm.vector_store_id = vs.id
LEFT JOIN file_chunks fc ON fc.file_id = fm.id
WHERE vs.organization_id = 'stress-test-org'
GROUP BY vs.id, vs.name;

-- Check MCP plugins
SELECT name, "displayName", "pluginType" FROM mcp_plugins WHERE name LIKE '%-stress';

3. Start Services¶

# Terminal 1: Start backend
cd packages/backend && pnpm run dev

# Terminal 2: Start admin
cd packages/admin && pnpm run dev

# Terminal 3 (optional): Start mobile
cd packages/mobile-app && pnpm run start

1. Telemetry Tests¶

Test 1.1: Basic Telemetry Capture¶

Objective: Verify telemetry is captured for simple queries

Steps:

Open Admin Dashboard
Navigate to a specialist test page
Send a simple query: "What is the refund policy?"
Check the Telemetry tab/section

Expected Results:

[ ] Trace ID is generated and displayed
[ ] TRIGGER node shows the original query
[ ] SPECIALIST node shows which specialist was selected
[ ] Query timestamp is recorded
[ ] Response time is captured

Verification Query:

SELECT * FROM routing_telemetry
WHERE "organizationId" = 'stress-test-org'
ORDER BY "createdAt" DESC LIMIT 5;

Test 1.2: Telemetry with Tool Execution¶

Objective: Verify telemetry captures tool calls

Steps:

Navigate to Technical Support specialist (has tools)
Send query: "What is 25 * 47?"
Check telemetry trace

Expected Results:

[ ] ACTION node appears for tool call
[ ] Tool name (calculate) is recorded
[ ] Tool input parameters are logged
[ ] Tool output is captured
[ ] Execution duration is measured

Test 1.3: Telemetry Under Load¶

Objective: Verify telemetry performance with many concurrent requests

Steps:

Open 3-5 browser tabs with specialist test pages
Send queries simultaneously from all tabs
Check each trace is independent

Expected Results:

[ ] Each request gets unique trace ID
[ ] No cross-contamination between traces
[ ] Latency remains under 2 seconds
[ ] All traces are queryable

Test 1.4: Telemetry Data Sanitization¶

Objective: Verify sensitive data is not stored in telemetry

Steps:

Send query containing email: "My email is test@example.com, what's my balance?"
Check telemetry data

Expected Results:

[ ] Email addresses are redacted or masked
[ ] No PII stored in plain text
[ ] Query is truncated if too long (>500 chars)

2. Specialist Handover Tests¶

Test 2.1: Simple Handover¶

Objective: Verify handover works when query matches different specialist

Steps:

Start conversation with Billing Specialist
Send: "I need help with billing" (should stay with billing)
Then send: "Actually, I have a technical error with the API"
Observe handover to Technical Support

Expected Results:

[ ] Handover message displayed in user's language
[ ] Technical Support specialist now active
[ ] Original conversation context preserved
[ ] New specialist responds appropriately

Test 2.2: Explicit Handover Request¶

Objective: Verify LLM correctly uses handover tool

Steps:

Chat with Technical Support
Send: "I don't need technical help anymore, I want to talk about my invoice"
Observe handover

Expected Results:

[ ] LLM calls handover_to_specialist tool
[ ] Handover reason is correctly extracted
[ ] Billing Specialist selected
[ ] Handover message localized correctly (EN/PT-BR)

Test 2.3: Handover Chain Prevention¶

Objective: Verify circular handovers are prevented

Steps:

Start with Billing Specialist
Ask billing questions
Request handover to Technical Support
Immediately ask: "I want to go back to billing"
Observe behavior

Expected Results:

[ ] System prevents immediate bounce-back
[ ] Previous specialists are tracked
[ ] User can still access billing after some interaction
[ ] No infinite loop occurs

Test 2.4: Handover with No Suitable Specialist¶

Objective: Verify graceful handling when no specialist matches

Steps:

Start conversation
Ask about something completely unrelated: "Tell me about quantum physics"
Observe response

Expected Results:

[ ] Handover fails gracefully
[ ] General assistant provides response
[ ] Error message displayed (localized)
[ ] Conversation continues normally

Test 2.5: Handover Telemetry¶

Objective: Verify handover events are captured in telemetry

Steps:

Perform a successful handover (Test 2.1)
Check telemetry data

Expected Results:

[ ] Handover action recorded in trace
[ ] Source specialist ID logged
[ ] Target specialist ID logged
[ ] Handover reason captured
[ ] handoverOccurred: true in routing_telemetry

3. Vector Store Query Tests¶

Test 3.1: Basic Semantic Search¶

Objective: Verify vector store queries return relevant results

Steps:

Navigate to Billing Specialist (linked to Billing Documentation)
Ask: "How do I get a refund?"
Check response includes knowledge base content

Expected Results:

[ ] Response mentions 30-day refund policy
[ ] Content from Billing Documentation used
[ ] Semantic similarity score > 0.7
[ ] Response is coherent and relevant

Test 3.2: Multi-Vector Store Query¶

Objective: Verify queries across multiple vector stores

Steps:

Use a specialist linked to multiple knowledge bases
Ask a question spanning multiple topics
Check responses include content from both

Expected Results:

[ ] Results from multiple vector stores
[ ] Re-ranking applied correctly
[ ] Most relevant chunks prioritized
[ ] No duplicate content

Test 3.3: Query Performance Under Load¶

Objective: Verify vector queries perform well with large datasets

Prerequisite: Run with --documents=100000 or higher

Steps:

Measure query time for simple query
Send 10 sequential queries
Record average response time

Expected Results:

[ ] Query latency < 500ms (embedding generation)
[ ] Vector search < 100ms (PostgreSQL pgvector)
[ ] Total response time < 3 seconds
[ ] No timeout errors

Test 3.4: Min Score Filtering¶

Objective: Verify low-relevance results are filtered

Steps:

Ask a question unrelated to any knowledge base
Example: "What's the weather like today?"
Check that irrelevant chunks are not included

Expected Results:

[ ] No chunks returned with score < 0.7
[ ] Response generated without hallucinating content
[ ] System handles no-match gracefully

Test 3.5: Re-Ranking Algorithm¶

Objective: Verify multi-factor re-ranking works correctly

Steps:

Ask question with multiple relevant chunks
Check ordering of results
Verify recent/early chunks ranked appropriately

Expected Results:

[ ] Semantic score weighted 65%
[ ] Vector store order weighted 15%
[ ] Chunk position weighted 10%
[ ] Recency weighted 10%
[ ] Final ranking is sensible

4. Tool Call Tests¶

Test 4.1: REST API Tool - GET Request¶

Objective: Verify REST GET tool execution

Steps:

Navigate to Technical Support (has Weather API)
Ask: "What's the weather in London?"
Observe tool execution

Expected Results:

[ ] Tool get_current_weather called
[ ] GET request sent to API
[ ] Response parsed correctly
[ ] Weather info displayed to user

Test 4.2: REST API Tool - POST Request¶

Objective: Verify REST POST tool execution

Steps:

Navigate to specialist with Task Manager plugin
Ask: "Create a task called 'Test task from LangChain'"
Observe tool execution

Expected Results:

[ ] Tool create_task called
[ ] POST request sent with correct body
[ ] Response confirms creation
[ ] Task info returned to user

Test 4.3: Builtin Tool Execution¶

Objective: Verify builtin (stdio) tools work

Steps:

Navigate to specialist with Calculator plugin
Ask: "Calculate 123 * 456 + 789"
Observe tool execution

Expected Results:

[ ] Tool calculate called
[ ] Expression parsed correctly
[ ] Result: 56877 returned
[ ] User sees the calculation result

Test 4.4: Multiple Tool Calls in Sequence¶

Objective: Verify multiple tools can be called in one conversation

Steps:

Ask: "What's 25 * 4?"
Wait for response
Ask: "Now multiply that result by 2"
Observe both tool calls

Expected Results:

[ ] First tool returns 100
[ ] Second tool returns 200
[ ] Context maintained between calls
[ ] Both results displayed correctly

Test 4.5: Tool Error Handling¶

Objective: Verify tool errors are handled gracefully

Steps:

Ask: "Get weather for 'invalid_city_xyz_123'"
Observe error handling

Expected Results:

[ ] API error captured
[ ] Error message in ToolMessage
[ ] LLM provides helpful response
[ ] Conversation continues normally

Test 4.6: Tool Iteration Limit¶

Objective: Verify tool loop doesn't run infinitely

Steps:

Craft a query that might cause repeated tool calls
Example: "Keep calculating until you reach infinity"
Observe iteration limit behavior

Expected Results:

[ ] Maximum 10 iterations (default)
[ ] iterationLimitReached: true in response
[ ] User receives partial response
[ ] No infinite loop

Test 4.7: OAuth Required Tool¶

Objective: Verify OAuth flow is triggered when needed

Prerequisite: Have a plugin requiring OAuth configured

Steps:

Try to use a tool requiring OAuth
Observe OAuth prompt

Expected Results:

[ ] oauthRequired event triggered
[ ] Provider info returned
[ ] Auth URL provided
[ ] Required scopes listed

Test 4.8: Tool Telemetry¶

Objective: Verify tool executions appear in telemetry

Steps:

Execute any tool (Test 4.1)
Check telemetry trace

Expected Results:

[ ] ACTION node for tool execution
[ ] Tool name recorded
[ ] Input parameters logged
[ ] Output captured
[ ] Duration measured

Performance Benchmarks¶

Baseline Metrics (with 100k chunks)¶

Operation	Expected	Acceptable	Critical
Simple query (no tools)	< 1s	< 2s	> 5s
Query with vector search	< 2s	< 3s	> 6s
Single tool call	< 3s	< 5s	> 10s
Multiple tool calls (3)	< 8s	< 12s	> 20s
Handover	< 2s	< 4s	> 8s
Telemetry write	< 50ms	< 100ms	> 500ms

Load Testing Checklist¶

[ ] 10 concurrent users: Response times < 3s
[ ] 50 concurrent users: Response times < 5s
[ ] 100 concurrent users: Response times < 10s
[ ] Memory usage stable over 30 minutes
[ ] No connection pool exhaustion
[ ] Telemetry data complete for all requests

Test Execution Log¶

Test ID	Date	Tester	Result	Notes
1.1			[ ] Pass / [ ] Fail
1.2			[ ] Pass / [ ] Fail
1.3			[ ] Pass / [ ] Fail
1.4			[ ] Pass / [ ] Fail
2.1			[ ] Pass / [ ] Fail
2.2			[ ] Pass / [ ] Fail
2.3			[ ] Pass / [ ] Fail
2.4			[ ] Pass / [ ] Fail
2.5			[ ] Pass / [ ] Fail
3.1			[ ] Pass / [ ] Fail
3.2			[ ] Pass / [ ] Fail
3.3			[ ] Pass / [ ] Fail
3.4			[ ] Pass / [ ] Fail
3.5			[ ] Pass / [ ] Fail
4.1			[ ] Pass / [ ] Fail
4.2			[ ] Pass / [ ] Fail
4.3			[ ] Pass / [ ] Fail
4.4			[ ] Pass / [ ] Fail
4.5			[ ] Pass / [ ] Fail
4.6			[ ] Pass / [ ] Fail
4.7			[ ] Pass / [ ] Fail
4.8			[ ] Pass / [ ] Fail

Troubleshooting¶

Common Issues¶

"No enabled LLM integration found"
Run: pnpm ts-node src/scripts/seed-llm.ts
Enable an LLM integration in admin
"Vector store not found"
Check organization ID matches
Verify seed script completed successfully
"Tool not found"
Check plugin is installed for organization
Verify specialist has plugin associated
Slow query times
Check PostgreSQL pgvector indexes
Monitor connection pool usage
Check embedding service availability
Handover not triggering
Verify specialists have distinct routing metadata
Check routing embeddings are generated
Review router service logs

Useful Debug Commands¶

# Check backend logs
tail -f packages/backend/logs/*.log

# Monitor PostgreSQL queries
SELECT query, calls, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;

# Check vector store indexes
SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass))
FROM pg_indexes WHERE tablename = 'file_chunks';

Sign-off¶

Role	Name	Date	Signature
Developer
QA
Product Owner