-
Notifications
You must be signed in to change notification settings - Fork 3
Kafka/long lived producer #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -92,51 +92,61 @@ def _retry_with_backoff(func, max_retries=3, initial_delay=1.0, backoff_multipli | |
| raise last_exception | ||
|
|
||
|
|
||
| _kafka_producer = None | ||
|
|
||
|
|
||
| def _get_kafka_producer(): | ||
| """Get or create a long-lived Kafka producer. | ||
|
|
||
| The producer is created lazily on first use and reused for subsequent | ||
| calls to avoid repeated TCP/TLS/SASL handshakes. | ||
| """ | ||
| global _kafka_producer | ||
| if _kafka_producer is None: | ||
| from kafka import KafkaProducer | ||
|
|
||
| _kafka_producer = KafkaProducer( | ||
| bootstrap_servers=conf.messaging_broker_urls, | ||
| compression_type=conf.messaging_kafka_compression_type, | ||
| security_protocol=conf.messaging_kafka_security_protocol, | ||
| sasl_mechanism=conf.messaging_kafka_sasl_mechanism, | ||
| sasl_plain_username=conf.messaging_kafka_username, | ||
| sasl_plain_password=conf.messaging_kafka_password, | ||
| value_serializer=lambda v: json.dumps(v).encode("utf-8"), | ||
| ) | ||
| return _kafka_producer | ||
|
|
||
|
|
||
| def _kafka_send_msg(msgs): | ||
| """Send messages to Kafka with retry logic. | ||
|
|
||
| Uses a persistent producer that is reused across calls. On failure, | ||
| the producer is closed and recreated on the next retry attempt. | ||
|
|
||
| :param list[dict] msgs: List of messages to be sent. | ||
| :raises Exception: If Kafka operations fail after retries | ||
| """ | ||
| from kafka import KafkaProducer | ||
|
|
||
| def _send(): | ||
| """Inner function to send messages (will be retried on failure)""" | ||
| config = { | ||
| "bootstrap_servers": conf.messaging_broker_urls, | ||
| "compression_type": conf.messaging_kafka_compression_type, | ||
| "security_protocol": conf.messaging_kafka_security_protocol, | ||
| "sasl_mechanism": conf.messaging_kafka_sasl_mechanism, | ||
| "sasl_plain_username": conf.messaging_kafka_username, | ||
| "sasl_plain_password": conf.messaging_kafka_password, | ||
| "value_serializer": lambda v: json.dumps(v).encode("utf-8"), | ||
| } | ||
|
|
||
| producer = None | ||
| global _kafka_producer | ||
| try: | ||
| producer = KafkaProducer(**config) | ||
|
|
||
| # Send all messages first, then flush once for better performance | ||
| producer = _get_kafka_producer() | ||
| for msg in msgs: | ||
| event = msg.get("event", "event") | ||
| topic = "%s%s" % (conf.messaging_topic_prefix, event) | ||
| producer.send(topic, msg) | ||
|
|
||
| # Single flush for all messages - more efficient than flushing each message | ||
| producer.flush() | ||
|
|
||
| except Exception as e: | ||
| log.error("Failed to send messages to Kafka: %s", str(e)) | ||
| raise | ||
| finally: | ||
| # Ensure producer is always closed, even on exceptions | ||
| if producer is not None: | ||
| # Close and discard the broken producer so the next retry | ||
| # creates a fresh connection. | ||
| if _kafka_producer is not None: | ||
| try: | ||
| producer.close() | ||
| except Exception as e: | ||
| log.warning("Error closing Kafka producer: %s", str(e)) | ||
| _kafka_producer.close() | ||
| except Exception: | ||
| pass | ||
| _kafka_producer = None | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This code seems rather fragile. There's a helper to create the producer, but here we still need to touch the global variable directly. Does KafkaProducer have some reconnection logic we could use instead?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Roger, relying on Kafka's built-in reconnection and retry API instead. |
||
| raise | ||
|
|
||
| # Retry the send operation with exponential backoff | ||
| _retry_with_backoff(_send) | ||
|
|
||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually ever raise any exceptions? It returns a Future immediately, so I would not expecte any network issues to appear as exceptions. Adding back the
flush()might help.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flush()was dropped to let Kafka handle batching. CTS sends few messages, andlinger_ms=0means they're sent almost immediately anyway.But you're right, without
flush(),send()just returns a Future and delivery errors are never raised. The error recovery code would never trigger. I'll add it back.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The built-in batching is a good point though. I didn't think about that. Maybe it's the error handling that should be removed?