Skip to main content

Cross-Database Joins

The Warehouse enables you to join data across different data sources - databases, graphs, and vector stores - using a unified query interface.

Overview

Cross-database joins allow you to combine data from:

  • PostgreSQL + MongoDB
  • MySQL + Neo4j
  • Any database + Any vector store
  • Graph databases + Traditional databases

The Warehouse handles the complexity of fetching data from different sources and joining them in memory using optimized algorithms.

Basic Join Syntax

const result = await ductape.warehouse.query({
operation: 'select',
from: {
type: 'database',
tag: 'primary-db',
entity: 'users',
alias: 'u'
},
fields: ['u.name', 'o.total', 'o.status'],
join: [{
type: 'left',
source: {
type: 'database',
tag: 'orders-db',
entity: 'orders',
alias: 'o'
},
on: { left: 'u.id', right: 'o.userId' }
}],
limit: 100
});

Join Types

Inner Join

Returns only records that have matching values in both sources:

join: [{
type: 'inner',
source: { type: 'database', tag: 'orders-db', entity: 'orders', alias: 'o' },
on: { left: 'u.id', right: 'o.userId' }
}]

Left Join

Returns all records from the left source, and matched records from the right:

join: [{
type: 'left',
source: { type: 'database', tag: 'orders-db', entity: 'orders', alias: 'o' },
on: { left: 'u.id', right: 'o.userId' }
}]

Right Join

Returns all records from the right source, and matched records from the left:

join: [{
type: 'right',
source: { type: 'database', tag: 'orders-db', entity: 'orders', alias: 'o' },
on: { left: 'u.id', right: 'o.userId' }
}]

Database-to-Database Joins

Join data between any two databases, even if they're different types:

// PostgreSQL users joined with MongoDB orders
const result = await ductape.warehouse.query({
operation: 'select',
from: {
type: 'database',
tag: 'users-postgres',
entity: 'users',
alias: 'u'
},
fields: ['u.id', 'u.name', 'u.email', 'o.orderId', 'o.total', 'o.createdAt'],
join: [{
type: 'left',
source: {
type: 'database',
tag: 'orders-mongo',
entity: 'orders',
alias: 'o'
},
on: { left: 'u.id', right: 'o.userId' }
}],
where: { 'u.status': { $eq: 'active' } },
orderBy: [{ field: 'o.createdAt', order: 'DESC' }],
limit: 50
});

Database-to-Graph Joins

Join relational data with graph traversals:

// Get users and their social connections from Neo4j
const result = await ductape.warehouse.query({
operation: 'select',
from: {
type: 'database',
tag: 'users-postgres',
entity: 'users',
alias: 'u'
},
fields: [
'u.id',
'u.name',
'friends.name as friendName',
'friends.connectionDate'
],
join: [{
type: 'left',
source: {
type: 'graph',
tag: 'social-neo4j',
entity: 'Person',
alias: 'friends'
},
graph: {
relationship: 'FRIENDS_WITH',
direction: 'both',
minDepth: 1,
maxDepth: 1
},
on: { left: 'u.id', right: 'friends.userId' }
}],
where: { 'u.status': { $eq: 'active' } }
});

Graph Join Configuration

OptionTypeDescription
relationshipstringThe relationship type to traverse
direction'outgoing' | 'incoming' | 'both'Direction of the relationship
minDepthnumberMinimum traversal depth (default: 1)
maxDepthnumberMaximum traversal depth (default: 1)

Multiple Joins

Chain multiple joins to combine data from several sources:

const result = await ductape.warehouse.query({
operation: 'select',
from: {
type: 'database',
tag: 'users-postgres',
entity: 'users',
alias: 'u'
},
fields: [
'u.name',
'u.email',
'o.total',
'p.name as productName',
'r.rating'
],
join: [
// First join: users -> orders
{
type: 'left',
source: {
type: 'database',
tag: 'orders-mongo',
entity: 'orders',
alias: 'o'
},
on: { left: 'u.id', right: 'o.userId' }
},
// Second join: orders -> products
{
type: 'inner',
source: {
type: 'database',
tag: 'products-postgres',
entity: 'products',
alias: 'p'
},
on: { left: 'o.productId', right: 'p.id' }
},
// Third join: products -> reviews
{
type: 'left',
source: {
type: 'database',
tag: 'reviews-mongo',
entity: 'reviews',
alias: 'r'
},
on: { left: 'p.id', right: 'r.productId' }
}
],
where: { 'o.status': { $eq: 'completed' } },
limit: 100
});

Filtering Joined Data

Apply filters to specific sources or across all joined data:

const result = await ductape.warehouse.query({
operation: 'select',
from: {
type: 'database',
tag: 'users-postgres',
entity: 'users',
alias: 'u'
},
fields: ['u.name', 'o.total'],
join: [{
type: 'inner',
source: {
type: 'database',
tag: 'orders-mongo',
entity: 'orders',
alias: 'o'
},
on: { left: 'u.id', right: 'o.userId' },
// Filter on the joined source
where: { 'o.status': { $eq: 'completed' } }
}],
// Filter on the combined result
where: {
$and: [
{ 'u.status': { $eq: 'active' } },
{ 'o.total': { $gte: 100 } }
]
}
});

Join Strategies

The Warehouse automatically selects the optimal join strategy based on data size:

StrategyBest ForDescription
Hash JoinLarge datasetsBuilds hash table for faster lookups
Nested LoopSmall right datasetSimple iteration, low memory
Sort-MergePre-sorted dataEfficient for ordered data

The strategy is selected automatically, but you can observe which was used in the result metadata:

const result = await ductape.warehouse.query({...});
console.log(result.metadata.sourceStats);
// Shows execution stats including join strategy used

Performance Tips

  1. Use aliases - Always provide aliases for cleaner field references
  2. Filter early - Apply where clauses on individual sources when possible
  3. Limit results - Use limit/offset to paginate large result sets
  4. Select specific fields - Don't use * for large joins
  5. Consider cardinality - Put the smaller dataset on the right side when possible

Next Steps