• Home
  • About
    • LiaX photo

      LiaX

      Running in my time zone

    • Learn More
    • LinkedIn
    • Github
  • Posts
    • All Posts
    • All Tags
  • Projects

Neo4j GDS vs Neo4j Database

05 Jun 2026

Reading time ~2 minutes

Difference Between Neo4j Database and Neo4j GDS

A useful way to think about the relationship is:

  • Neo4j Database is responsible for storing and managing graph data.
  • Neo4j Graph Data Science (GDS) is responsible for running graph analytics and graph algorithms efficiently.

This is somewhat similar to the relationship between PostgreSQL and Spark, although the comparison is not exact. Unlike Spark, GDS is not a separate processing engine; it is a library that runs within the Neo4j ecosystem and uses specialized in-memory graph structures for computation.

Why Is GDS Necessary?

Graph algorithms such as HDBSCAN, PageRank, Weakly Connected Components, and Node Similarity repeatedly traverse the graph and access neighboring nodes.

If every algorithmic operation had to repeatedly interact with Neo4j’s transactional storage layer, execution would be significantly slower. The storage engine is optimized for:

  • ACID transactions
  • durability and recovery
  • indexing and constraints
  • concurrent updates

These requirements are different from the needs of large-scale graph analytics.

For this reason, GDS introduces the concept of a Graph Projection (often called an in-memory graph).

For example:

CALL gds.graph.project(
  'cases',
  'Case',
  '*'
)

This command creates an in-memory graph projection named cases.

Importantly, it does not create new business data. Instead, it reads selected nodes and relationships from the Neo4j database and transforms them into a graph representation optimized for algorithm execution.

What Happens During Graph Projection?

Conceptually:

Neo4j Database
        |
        v
Graph Projection
        |
        v
Graph Algorithms

The database stores graph data in a format optimized for transactional operations.

For example:

Case1 --SIMILAR_TO--> Case2
Case1 --SIMILAR_TO--> Case3

The storage layer must support:

  • transactions
  • durability
  • recovery
  • indexing
  • constraints
  • concurrent access

The projected graph, however, is optimized for analytics.

Conceptually it may resemble:

Node 1 -> [2, 3]
Node 2 -> [1]
Node 3 -> [1]

Internally, GDS uses compact graph representations such as:

  • adjacency structures
  • compressed indexes
  • memory-efficient property storage
  • specialized data layouts for graph traversal

These structures allow graph algorithms to access connectivity information much more efficiently than repeatedly querying the transactional database layer.

Where Are They Stored?

Neo4j Database

The Neo4j database is persistent.

Its data is stored on disk and survives database restarts; server reboots and system shutdowns.

Neo4j also uses memory for caching, but the authoritative copy of the graph is persistent storage.

GDS Graph Projection

A GDS graph projection exists only in memory.

CALL gds.graph.project(...)

creates an in-memory analytical representation of part of the graph.

If Neo4j is restarted, all graph projections are removed and must be projected again.

The original database data remains unchanged.

Key Takeaway

CALL gds.graph.project(...) does not create new graph data.

Its purpose is to transform selected data from the Neo4j database into an in-memory graph structure optimized for graph algorithms.

Therefore:

  • Neo4j Database = transactional storage layer
  • GDS Graph Projection = analytical in-memory graph representation
  • GDS Algorithms operate on the graph projection, not directly on the transactional database

This separation allows graph algorithms to run significantly faster while leaving the persistent database unchanged.



Share Tweet +1