feat: Bring Your Own Spark - SparkApplication#6550
Draft
aniketpalu wants to merge 1 commit into
Draft
Conversation
…aterialization Adds a new batch compute engine that submits materialization jobs as SparkApplication CRDs via the Kubeflow Spark Operator. One 'feast materialize' call creates one SparkApplication pod that processes all feature views using distributed Spark, rather than running in-process on the Feast server. Key changes: - Refactor materialize()/materialize_incremental() to pass all tasks to the engine in a single batch call instead of looping per feature view. Existing engines are unaffected (base class loops tasks internally via _materialize_one). - Add public get_provider() method on FeatureStore. - New spark_application engine: config, compute, job, driver script, Dockerfile. - 12 unit tests covering config, validation, CR structure, state mapping, timeout, cleanup, and job naming.
| from tqdm import tqdm | ||
|
|
||
| fv_name = task_info["feature_view"] | ||
| logger.info(f"Thread started: {fv_name}") |
Comment on lines
+112
to
+113
| f"Starting materialization: {total} feature views, " | ||
| f"concurrency={concurrency}" |
| succeeded, failed = 0, 0 | ||
| for i, task in enumerate(tasks, 1): | ||
| fv_name = task["feature_view"] | ||
| logger.info(f"[{i}/{total}] Materializing: {fv_name}") |
| try: | ||
| name, elapsed = _materialize_one_fv(spark, feast_config, task) | ||
| succeeded += 1 | ||
| logger.info(f"[{i}/{total}] Completed: {name} ({elapsed:.1f}s)") |
| logger.info(f"[{i}/{total}] Completed: {name} ({elapsed:.1f}s)") | ||
| except Exception: | ||
| failed += 1 | ||
| logger.exception(f"[{i}/{total}] Failed: {fv_name}") |
| logger.info(f"Completed: {name} ({elapsed:.1f}s)") | ||
| except Exception: | ||
| failed += 1 | ||
| logger.exception(f"Failed: {fv_name}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Core changes
Upstream (feature_store.py):
New engine (spark_application/):
Design decisions
Validated on
Test plan
Which issue(s) this PR fixes:
Checks
git commit -s)Testing Strategy
Misc