Skip to main content

Best Practices

Optimize your Warehouse queries for performance, reliability, and maintainability.

Query Optimization

1. Select Only Needed Fields

Avoid using * in cross-database queries:

// ❌ Bad: Fetches all fields from all sources
fields: ['*']

// ✅ Good: Only fetch what you need
fields: ['u.id', 'u.name', 'o.total', 'o.status']

2. Filter Early

Apply filters on individual sources before joining:

// ✅ Good: Filter applied to joined source
join: [{
type: 'left',
source: { type: 'database', tag: 'orders-db', entity: 'orders', alias: 'o' },
on: { left: 'u.id', right: 'o.userId' },
where: { 'o.status': { $eq: 'completed' } } // Filter before join
}]

// Also filter primary source
where: { 'u.status': { $eq: 'active' } }

3. Use Appropriate Limits

Always use pagination for large datasets:

const result = await ductape.warehouse.query({
operation: 'select',
from: { type: 'database', tag: 'users-db', entity: 'users', alias: 'u' },
fields: ['u.id', 'u.name'],
limit: 50,
offset: page * 50
});

4. Leverage Aliases

Use clear, consistent aliases for readability:

// ✅ Good: Clear aliases
from: { type: 'database', tag: 'users-postgres', entity: 'users', alias: 'u' },
join: [
{ source: { tag: 'orders-mongo', entity: 'orders', alias: 'o' } },
{ source: { tag: 'products-postgres', entity: 'products', alias: 'p' } }
]

Join Performance

1. Put Smaller Dataset on Right

When joining, the Warehouse fetches the right side for each left record:

// ✅ If users is smaller than orders, use users as right
from: { type: 'database', tag: 'orders-db', entity: 'orders', alias: 'o' },
join: [{
type: 'inner',
source: { type: 'database', tag: 'users-db', entity: 'users', alias: 'u' },
on: { left: 'o.userId', right: 'u.id' }
}]

2. Limit Semantic Join Results

Control the number of vector matches:

semantic: {
embedField: 'description',
similarityThreshold: 0.8, // Higher threshold = fewer matches
topK: 5 // Limit matches per record
}

3. Avoid Deep Graph Traversals

Keep graph depth minimal:

graph: {
relationship: 'FRIENDS_WITH',
direction: 'outgoing',
minDepth: 1,
maxDepth: 2 // Keep depth low
}

Transaction Design

1. Keep Transactions Small

Minimize the number of operations per transaction:

// ✅ Good: Essential operations only
await ductape.warehouse.transaction([
{ operation: 'insert', from: { tag: 'orders-db' }, data: order },
{ operation: 'update', from: { tag: 'inventory-db' }, data: stockUpdate, where: productFilter }
]);

// ❌ Bad: Too many unrelated operations
await ductape.warehouse.transaction([
{ /* order */ },
{ /* inventory */ },
{ /* audit log */ }, // Could be async
{ /* analytics */ }, // Could be async
{ /* notifications */ } // Could be async
]);

2. Use Idempotent Operations

Design for safe retries:

// ✅ Good: Idempotent with unique key
{
operation: 'upsert',
from: { type: 'database', tag: 'events-db', entity: 'events' },
data: {
idempotencyKey: `${eventType}_${userId}_${timestamp}`,
type: eventType,
userId,
timestamp
}
}

// ✅ Good: Conditional update
{
operation: 'update',
from: { type: 'database', tag: 'inventory-db', entity: 'inventory' },
data: { quantity: { $decrement: 1 } },
where: {
productId: { $eq: productId },
quantity: { $gte: 1 } // Only if stock available
}
}

3. Handle Failures Gracefully

Always check transaction status:

const result = await ductape.warehouse.transaction(operations);

if (result.status === 'completed') {
return { success: true };
}

if (result.status === 'compensated') {
// Rolled back - inform user
return { success: false, message: 'Order could not be processed', retry: true };
}

if (result.status === 'failed') {
// Partial failure - may need manual intervention
console.error('Transaction in inconsistent state:', result);
await alertOpsTeam(result);
return { success: false, message: 'Please contact support', supportTicket: true };
}

Data Source Design

1. Use Consistent Naming

// ✅ Good: Clear, consistent naming
'users-postgres'
'orders-mongodb'
'social-neo4j'
'embeddings-pinecone'

// ❌ Bad: Inconsistent
'userDB'
'OrdersMongo'
'neo4j-graph'

2. Plan Join Keys

Ensure join keys are indexed and of compatible types:

// In PostgreSQL users table: id INTEGER PRIMARY KEY
// In MongoDB orders collection: userId field should be indexed
// Ensure types match: both INTEGER or both STRING

join: [{
on: { left: 'u.id', right: 'o.userId' } // Both should be same type
}]

3. Document Entity Relationships

/**
* Data Sources:
* - users-postgres: Main user data (users, profiles)
* - users.id -> orders.userId
* - users.id -> Person.userId (graph)
*
* - orders-mongo: Order transactions
* - orders.productId -> products.id
*
* - social-neo4j: Social graph
* - Person -[FRIENDS_WITH]-> Person
* - Person -[FOLLOWS]-> Person
*/

Error Handling

1. Validate Before Querying

function validateQuery(source: IDataSource) {
if (!source.tag) {
throw new Error('Data source tag is required');
}
if (!source.entity) {
throw new Error('Entity name is required');
}
}

2. Handle Specific Errors

try {
const result = await ductape.warehouse.query(query);
return result;
} catch (error) {
if (error.code === 'SOURCE_NOT_FOUND') {
throw new Error(`Data source "${query.from.tag}" not found`);
}
if (error.code === 'ENTITY_NOT_FOUND') {
throw new Error(`Entity "${query.from.entity}" does not exist`);
}
if (error.code === 'QUERY_TIMEOUT') {
// Maybe retry with smaller limit
return retryWithLimit(query, query.limit / 2);
}
throw error;
}

3. Log Query Metadata

const result = await ductape.warehouse.query(query);

logger.info('Query executed', {
operation: query.operation,
sources: result.metadata.sourcesQueried,
executionTime: result.metadata.executionTime,
rowsReturned: result.data.length,
cached: result.metadata.cached
});

Security

1. Validate User Input

Never directly use user input in queries:

// ❌ Bad: Potential injection
where: { 'u.id': { $eq: req.params.userId } }

// ✅ Good: Validate and sanitize
const userId = parseInt(req.params.userId, 10);
if (isNaN(userId) || userId <= 0) {
throw new Error('Invalid user ID');
}
where: { 'u.id': { $eq: userId } }

2. Use Parameterized Values

The Warehouse automatically parameterizes values, but validate types:

// Ensure types match expected
const query = {
where: {
'u.email': { $eq: String(email) },
'u.age': { $gte: Number(minAge) }
}
};

3. Limit Data Exposure

Only return necessary fields:

// ❌ Bad: Returns sensitive data
fields: ['u.*']

// ✅ Good: Explicit safe fields
fields: ['u.id', 'u.name', 'u.publicEmail', 'u.avatar']
// Exclude: passwordHash, internalNotes, etc.

Monitoring

1. Track Slow Queries

const startTime = Date.now();
const result = await ductape.warehouse.query(query);
const duration = result.metadata.executionTime;

if (duration > 1000) { // More than 1 second
logger.warn('Slow query detected', {
query: JSON.stringify(query),
duration,
sourcesQueried: result.metadata.sourcesQueried
});
}

2. Monitor Transaction Health

const metrics = {
transactionsTotal: 0,
transactionsCompleted: 0,
transactionsCompensated: 0,
transactionsFailed: 0
};

const result = await ductape.warehouse.transaction(operations);
metrics.transactionsTotal++;
metrics[`transactions${result.status.charAt(0).toUpperCase() + result.status.slice(1)}`]++;

Summary

CategoryBest Practice
QueriesSelect specific fields, filter early, use limits
JoinsSmaller dataset on right, limit semantic matches
TransactionsKeep small, use idempotent operations, handle failures
Data SourcesConsistent naming, plan join keys, document relationships
ErrorsValidate input, handle specific errors, log metadata
SecurityValidate user input, limit data exposure
MonitoringTrack slow queries, monitor transaction health