feat(storage): delete whole row on vertex removal to reduce super-node tombstones#4902
Open
porunov wants to merge 1 commit into
Open
feat(storage): delete whole row on vertex removal to reduce super-node tombstones#4902porunov wants to merge 1 commit into
porunov wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an optimization to reduce tombstone and client-side overhead when fully removing vertices (especially super-nodes) by allowing storage backends to perform a single whole-row/partition delete instead of issuing per-column deletes.
Changes:
- Add a whole-row-deletion flag to mutations and plumb it through the edge-store write path to backend
mutateMany. - Track fully removed vertices in
StandardJanusGraphTxand, when enabled + supported by the backend, skip per-column deletion serialization and issue a whole-row delete for the vertex’s edge-store row. - Add backend capability advertisement + implementations (CQL, HBase, InMemory) and add tests/docs covering the new behavior.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| janusgraph-inmemory/src/test/java/org/janusgraph/graphdb/inmemory/WholeRowDeletionOptimizationTest.java | Integration test asserting whole-row deletion is used for super-node removal when enabled. |
| janusgraph-inmemory/src/test/java/org/janusgraph/graphdb/inmemory/RemovedVertexTrackingTest.java | Test verifying transaction-level tracking of fully removed vertices. |
| janusgraph-inmemory/src/test/java/org/janusgraph/graphdb/inmemory/PartitionedVertexWholeRowDeletionTest.java | Integration test for whole-row deletion behavior on partitioned vertex removal. |
| janusgraph-inmemory/src/test/java/org/janusgraph/diskstorage/inmemory/WholeRowDeletionCapturingStoreManager.java | Test-only store manager capturing mutateMany calls for assertions. |
| janusgraph-inmemory/src/test/java/org/janusgraph/diskstorage/inmemory/InMemoryWholeRowDeletionTest.java | Unit tests for in-memory backend whole-row delete semantics. |
| janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/InMemoryStoreManager.java | Advertises optimized whole-row deletion and implements it in mutateMany. |
| janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/InMemoryKeyColumnValueStore.java | Adds deleteRow used by in-memory whole-row deletion. |
| janusgraph-hbase/src/test/java/org/janusgraph/diskstorage/hbase/HBaseStoreManagerMutationTest.java | Test ensuring HBase uses family-level delete for whole-row deletion mutations. |
| janusgraph-hbase/src/main/java/org/janusgraph/diskstorage/hbase/HBaseStoreManager.java | Advertises feature and emits HBase family delete when whole-row deletion is flagged. |
| janusgraph-cql/src/test/java/org/janusgraph/diskstorage/cql/CQLWholeRowDeletionTest.java | Test ensuring CQL backend advertises and executes partition-level deletes. |
| janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/function/mutate/AbstractCQLMutateManyFunction.java | Emits DELETE ... WHERE key=? when whole-row deletion is requested. |
| janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/CQLKeyColumnValueStore.java | Adds prepared statement + API for row/partition delete. |
| janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/builder/CQLStoreFeaturesBuilder.java | Advertises optimized whole-row deletion for CQL store features. |
| janusgraph-core/src/test/java/org/janusgraph/diskstorage/MutationTest.java | Tests new mutation flag behavior (emptiness/merge/total mutation count). |
| janusgraph-core/src/test/java/org/janusgraph/diskstorage/keycolumnvalue/StandardStoreFeaturesTest.java | Tests default/enable/copy behavior for the new StoreFeatures capability. |
| janusgraph-core/src/test/java/org/janusgraph/diskstorage/keycolumnvalue/cache/CacheTransactionWholeRowDeletionTest.java | Ensures whole-row deletion flag reaches backend mutateMany. |
| janusgraph-core/src/main/java/org/janusgraph/graphdb/vertices/StandardVertex.java | Records vertex removal in the transaction for commit-time detection. |
| janusgraph-core/src/main/java/org/janusgraph/graphdb/transaction/StandardJanusGraphTx.java | Adds removed-vertex canonical-id tracking and query method for commit-time checks. |
| janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java | Commit-time decision to issue whole-row delete and skip per-column deletions for fully removed vertices. |
| janusgraph-core/src/main/java/org/janusgraph/graphdb/configuration/GraphDatabaseConfiguration.java | Adds storage.drop-whole-row-on-vertex-removal config option and wiring. |
| janusgraph-core/src/main/java/org/janusgraph/diskstorage/Mutation.java | Adds whole-row deletion flag and merges/count semantics. |
| janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/StoreFeatures.java | Adds hasOptimizedWholeRowDeletion() capability contract. |
| janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/StandardStoreFeatures.java | Implements new feature flag in store features + builder. |
| janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/cache/KCVSCache.java | Plumbs whole-row deletion flag into cache transaction mutations. |
| janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/cache/CacheTransaction.java | Persists whole-row deletion flag into KCVMutation sent to backend. |
| janusgraph-core/src/main/java/org/janusgraph/diskstorage/BackendTransaction.java | Adds overload to carry whole-row deletion flag to edge-store mutation calls. |
| docs/storage-backend/cassandra.md | Documents super-node deletion using partition-level delete and the config toggle. |
| docs/configs/janusgraph-cfg.md | Documents the new storage.drop-whole-row-on-vertex-removal option. |
| docs/changelog.md | Adds changelog entry describing whole-row deletion optimization and how to disable it. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…e tombstones Removing a vertex previously issued one column-level delete per incident edge/property on the vertex's storage row. For super-nodes this produced thousands of cell tombstones on a single Cassandra partition, leading to read timeouts and TombstoneOverwhelmingException. When a vertex is fully removed, JanusGraph now deletes its entire storage row in a single backend operation: on CQL a partition-level `DELETE ... WHERE key = ?` (one partition tombstone) instead of N per-column deletes. Because the per-column delete entries are no longer serialized, this also reduces client-side memory, batching, and network cost for large removals. Implementation: - New whole-row-deletion flag on `Mutation`, plumbed through `BackendTransaction.mutateEdges` -> `KCVSCache.mutateEntries` -> `CacheTransaction` -> `KCVMutation` -> `KeyColumnValueStoreManager.mutateMany`. - New `StoreFeatures.hasOptimizedWholeRowDeletion()` capability (default false). The graph skips building per-column deletions only when the backend advertises the capability, so backends without it keep the previous per-column behavior (no data loss). - Commit-time detection in `StandardJanusGraph.prepareCommitAddRelationMutations`, driven by `StandardJanusGraphTx` tracking the canonical ids of fully-removed vertices. Only the edge store is affected; the index store still deletes per-column (its rows are shared across vertices). Partitioned vertices wipe every representative row via canonical-id normalization. - Backend support: CQL/Scylla (partition delete), in-memory (drop key), HBase (column-family delete). BerkeleyJE and others fall back to per-column deletes. - Controlled by the new `storage.drop-whole-row-on-vertex-removal` option (default true). Tests cover the Mutation flag, the write-path plumbing, each backend, and graph-level integration (super-node removal, flag-off fallback, partial-edge removal, and partitioned-vertex removal). Documented in docs/changelog.md (1.2.0) and docs/storage-backend/cassandra.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
5080a38 to
d860de1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Removing a vertex previously issued one column-level delete per incident edge/property on the vertex's storage row. For super-nodes this produced thousands of cell tombstones on a single Cassandra partition, leading to read timeouts and TombstoneOverwhelmingException.
When a vertex is fully removed, JanusGraph now deletes its entire storage row in a single backend operation: on CQL a partition-level
DELETE ... WHERE key = ?(one partition tombstone) instead of N per-column deletes. Because the per-column delete entries are no longer serialized, this also reduces client-side memory, batching, and network cost for large removals.Implementation:
Mutation, plumbed throughBackendTransaction.mutateEdges->KCVSCache.mutateEntries->CacheTransaction->KCVMutation->KeyColumnValueStoreManager.mutateMany.StoreFeatures.hasOptimizedWholeRowDeletion()capability (default false). The graph skips building per-column deletions only when the backend advertises the capability, so backends without it keep the previous per-column behavior (no data loss).StandardJanusGraph.prepareCommitAddRelationMutations, driven byStandardJanusGraphTxtracking the canonical ids of fully-removed vertices. Only the edge store is affected; the index store still deletes per-column (its rows are shared across vertices). Partitioned vertices wipe every representative row via canonical-id normalization.storage.drop-whole-row-on-vertex-removaloption (default true).Tests cover the Mutation flag, the write-path plumbing, each backend, and graph-level integration (super-node removal, flag-off fallback, partial-edge removal, and partitioned-vertex removal). Documented in docs/changelog.md (1.2.0) and docs/storage-backend/cassandra.md.
Thank you for contributing to JanusGraph!
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
master)?For code changes:
For documentation related changes: