Difference Between Neo4j Database and Neo4j GDS
A useful way to think about the relationship is:
- Neo4j Database is responsible for storing and managing graph data.
- Neo4j Graph Data Science (GDS) is responsible for running graph analytics and graph algorithms efficiently.
This is somewhat similar to the relationship between PostgreSQL and Spark, although the comparison is not exact. Unlike Spark, GDS is not a separate processing engine; it is a library that runs within the Neo4j ecosystem and uses specialized in-memory graph structures for computation.
Why Is GDS Necessary?
Graph algorithms such as HDBSCAN, PageRank, Weakly Connected Components, and Node Similarity repeatedly traverse the graph and access neighboring nodes.
If every algorithmic operation had to repeatedly interact with Neo4j’s transactional storage layer, execution would be significantly slower. The storage engine is optimized for:
- ACID transactions
- durability and recovery
- indexing and constraints
- concurrent updates
These requirements are different from the needs of large-scale graph analytics.
For this reason, GDS introduces the concept of a Graph Projection (often called an in-memory graph).
For example:
CALL gds.graph.project(
'cases',
'Case',
'*'
)
This command creates an in-memory graph projection named cases.
Importantly, it does not create new business data. Instead, it reads selected nodes and relationships from the Neo4j database and transforms them into a graph representation optimized for algorithm execution.
What Happens During Graph Projection?
Conceptually:
Neo4j Database
|
v
Graph Projection
|
v
Graph Algorithms
The database stores graph data in a format optimized for transactional operations.
For example:
Case1 --SIMILAR_TO--> Case2
Case1 --SIMILAR_TO--> Case3
The storage layer must support:
- transactions
- durability
- recovery
- indexing
- constraints
- concurrent access
The projected graph, however, is optimized for analytics.
Conceptually it may resemble:
Node 1 -> [2, 3]
Node 2 -> [1]
Node 3 -> [1]
Internally, GDS uses compact graph representations such as:
- adjacency structures
- compressed indexes
- memory-efficient property storage
- specialized data layouts for graph traversal
These structures allow graph algorithms to access connectivity information much more efficiently than repeatedly querying the transactional database layer.
Where Are They Stored?
Neo4j Database
The Neo4j database is persistent.
Its data is stored on disk and survives database restarts; server reboots and system shutdowns.
Neo4j also uses memory for caching, but the authoritative copy of the graph is persistent storage.
GDS Graph Projection
A GDS graph projection exists only in memory.
CALL gds.graph.project(...)
creates an in-memory analytical representation of part of the graph.
If Neo4j is restarted, all graph projections are removed and must be projected again.
The original database data remains unchanged.
Key Takeaway
CALL gds.graph.project(...) does not create new graph data.
Its purpose is to transform selected data from the Neo4j database into an in-memory graph structure optimized for graph algorithms.
Therefore:
- Neo4j Database = transactional storage layer
- GDS Graph Projection = analytical in-memory graph representation
- GDS Algorithms operate on the graph projection, not directly on the transactional database
This separation allows graph algorithms to run significantly faster while leaving the persistent database unchanged.