Skip to content

feat(storage): delete whole row on vertex removal to reduce super-node tombstones#4902

Open
porunov wants to merge 1 commit into
JanusGraph:masterfrom
porunov:feature/cql-super-node-delete-optimization
Open

feat(storage): delete whole row on vertex removal to reduce super-node tombstones#4902
porunov wants to merge 1 commit into
JanusGraph:masterfrom
porunov:feature/cql-super-node-delete-optimization

Conversation

@porunov

@porunov porunov commented Jun 26, 2026

Copy link
Copy Markdown
Member

Removing a vertex previously issued one column-level delete per incident edge/property on the vertex's storage row. For super-nodes this produced thousands of cell tombstones on a single Cassandra partition, leading to read timeouts and TombstoneOverwhelmingException.

When a vertex is fully removed, JanusGraph now deletes its entire storage row in a single backend operation: on CQL a partition-level DELETE ... WHERE key = ? (one partition tombstone) instead of N per-column deletes. Because the per-column delete entries are no longer serialized, this also reduces client-side memory, batching, and network cost for large removals.

Implementation:

  • New whole-row-deletion flag on Mutation, plumbed through BackendTransaction.mutateEdges -> KCVSCache.mutateEntries -> CacheTransaction -> KCVMutation -> KeyColumnValueStoreManager.mutateMany.
  • New StoreFeatures.hasOptimizedWholeRowDeletion() capability (default false). The graph skips building per-column deletions only when the backend advertises the capability, so backends without it keep the previous per-column behavior (no data loss).
  • Commit-time detection in StandardJanusGraph.prepareCommitAddRelationMutations, driven by StandardJanusGraphTx tracking the canonical ids of fully-removed vertices. Only the edge store is affected; the index store still deletes per-column (its rows are shared across vertices). Partitioned vertices wipe every representative row via canonical-id normalization.
  • Backend support: CQL/Scylla (partition delete), in-memory (drop key), HBase (column-family delete). BerkeleyJE and others fall back to per-column deletes.
  • Controlled by the new storage.drop-whole-row-on-vertex-removal option (default true).

Tests cover the Mutation flag, the write-path plumbing, each backend, and graph-level integration (super-node removal, flag-off fallback, partial-edge removal, and partitioned-vertex removal). Documented in docs/changelog.md (1.2.0) and docs/storage-backend/cassandra.md.


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you written and/or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an optimization to reduce tombstone and client-side overhead when fully removing vertices (especially super-nodes) by allowing storage backends to perform a single whole-row/partition delete instead of issuing per-column deletes.

Changes:

  • Add a whole-row-deletion flag to mutations and plumb it through the edge-store write path to backend mutateMany.
  • Track fully removed vertices in StandardJanusGraphTx and, when enabled + supported by the backend, skip per-column deletion serialization and issue a whole-row delete for the vertex’s edge-store row.
  • Add backend capability advertisement + implementations (CQL, HBase, InMemory) and add tests/docs covering the new behavior.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
janusgraph-inmemory/src/test/java/org/janusgraph/graphdb/inmemory/WholeRowDeletionOptimizationTest.java Integration test asserting whole-row deletion is used for super-node removal when enabled.
janusgraph-inmemory/src/test/java/org/janusgraph/graphdb/inmemory/RemovedVertexTrackingTest.java Test verifying transaction-level tracking of fully removed vertices.
janusgraph-inmemory/src/test/java/org/janusgraph/graphdb/inmemory/PartitionedVertexWholeRowDeletionTest.java Integration test for whole-row deletion behavior on partitioned vertex removal.
janusgraph-inmemory/src/test/java/org/janusgraph/diskstorage/inmemory/WholeRowDeletionCapturingStoreManager.java Test-only store manager capturing mutateMany calls for assertions.
janusgraph-inmemory/src/test/java/org/janusgraph/diskstorage/inmemory/InMemoryWholeRowDeletionTest.java Unit tests for in-memory backend whole-row delete semantics.
janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/InMemoryStoreManager.java Advertises optimized whole-row deletion and implements it in mutateMany.
janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/InMemoryKeyColumnValueStore.java Adds deleteRow used by in-memory whole-row deletion.
janusgraph-hbase/src/test/java/org/janusgraph/diskstorage/hbase/HBaseStoreManagerMutationTest.java Test ensuring HBase uses family-level delete for whole-row deletion mutations.
janusgraph-hbase/src/main/java/org/janusgraph/diskstorage/hbase/HBaseStoreManager.java Advertises feature and emits HBase family delete when whole-row deletion is flagged.
janusgraph-cql/src/test/java/org/janusgraph/diskstorage/cql/CQLWholeRowDeletionTest.java Test ensuring CQL backend advertises and executes partition-level deletes.
janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/function/mutate/AbstractCQLMutateManyFunction.java Emits DELETE ... WHERE key=? when whole-row deletion is requested.
janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/CQLKeyColumnValueStore.java Adds prepared statement + API for row/partition delete.
janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/builder/CQLStoreFeaturesBuilder.java Advertises optimized whole-row deletion for CQL store features.
janusgraph-core/src/test/java/org/janusgraph/diskstorage/MutationTest.java Tests new mutation flag behavior (emptiness/merge/total mutation count).
janusgraph-core/src/test/java/org/janusgraph/diskstorage/keycolumnvalue/StandardStoreFeaturesTest.java Tests default/enable/copy behavior for the new StoreFeatures capability.
janusgraph-core/src/test/java/org/janusgraph/diskstorage/keycolumnvalue/cache/CacheTransactionWholeRowDeletionTest.java Ensures whole-row deletion flag reaches backend mutateMany.
janusgraph-core/src/main/java/org/janusgraph/graphdb/vertices/StandardVertex.java Records vertex removal in the transaction for commit-time detection.
janusgraph-core/src/main/java/org/janusgraph/graphdb/transaction/StandardJanusGraphTx.java Adds removed-vertex canonical-id tracking and query method for commit-time checks.
janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java Commit-time decision to issue whole-row delete and skip per-column deletions for fully removed vertices.
janusgraph-core/src/main/java/org/janusgraph/graphdb/configuration/GraphDatabaseConfiguration.java Adds storage.drop-whole-row-on-vertex-removal config option and wiring.
janusgraph-core/src/main/java/org/janusgraph/diskstorage/Mutation.java Adds whole-row deletion flag and merges/count semantics.
janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/StoreFeatures.java Adds hasOptimizedWholeRowDeletion() capability contract.
janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/StandardStoreFeatures.java Implements new feature flag in store features + builder.
janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/cache/KCVSCache.java Plumbs whole-row deletion flag into cache transaction mutations.
janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/cache/CacheTransaction.java Persists whole-row deletion flag into KCVMutation sent to backend.
janusgraph-core/src/main/java/org/janusgraph/diskstorage/BackendTransaction.java Adds overload to carry whole-row deletion flag to edge-store mutation calls.
docs/storage-backend/cassandra.md Documents super-node deletion using partition-level delete and the config toggle.
docs/configs/janusgraph-cfg.md Documents the new storage.drop-whole-row-on-vertex-removal option.
docs/changelog.md Adds changelog entry describing whole-row deletion optimization and how to disable it.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…e tombstones

Removing a vertex previously issued one column-level delete per incident
edge/property on the vertex's storage row. For super-nodes this produced
thousands of cell tombstones on a single Cassandra partition, leading to read
timeouts and TombstoneOverwhelmingException.

When a vertex is fully removed, JanusGraph now deletes its entire storage row in
a single backend operation: on CQL a partition-level `DELETE ... WHERE key = ?`
(one partition tombstone) instead of N per-column deletes. Because the per-column
delete entries are no longer serialized, this also reduces client-side memory,
batching, and network cost for large removals.

Implementation:
- New whole-row-deletion flag on `Mutation`, plumbed through
  `BackendTransaction.mutateEdges` -> `KCVSCache.mutateEntries` ->
  `CacheTransaction` -> `KCVMutation` -> `KeyColumnValueStoreManager.mutateMany`.
- New `StoreFeatures.hasOptimizedWholeRowDeletion()` capability (default false).
  The graph skips building per-column deletions only when the backend advertises
  the capability, so backends without it keep the previous per-column behavior
  (no data loss).
- Commit-time detection in `StandardJanusGraph.prepareCommitAddRelationMutations`,
  driven by `StandardJanusGraphTx` tracking the canonical ids of fully-removed
  vertices. Only the edge store is affected; the index store still deletes
  per-column (its rows are shared across vertices). Partitioned vertices wipe
  every representative row via canonical-id normalization.
- Backend support: CQL/Scylla (partition delete), in-memory (drop key), HBase
  (column-family delete). BerkeleyJE and others fall back to per-column deletes.
- Controlled by the new `storage.drop-whole-row-on-vertex-removal` option
  (default true).

Tests cover the Mutation flag, the write-path plumbing, each backend, and
graph-level integration (super-node removal, flag-off fallback, partial-edge
removal, and partitioned-vertex removal). Documented in docs/changelog.md
(1.2.0) and docs/storage-backend/cassandra.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
@porunov porunov force-pushed the feature/cql-super-node-delete-optimization branch from 5080a38 to d860de1 Compare June 27, 2026 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants