Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/trigger_files/beam_PostCommit_Python.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"comment": "Modify this file in a trivial way to cause this test suite to run.",
"pr": "37345",
"modification": 49
}
"modification": 50
}
11 changes: 8 additions & 3 deletions sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,13 @@ def _bq_uuid(seed=None):
else:
return str(hashlib.md5(seed.encode('utf8')).hexdigest())

def _bq_uuid_list(list):
checksum = hashlib.sha256()
for item in list:
checksum.update(item.encode('utf-8'))
# separator
checksum.update(b'\x00')
return checksum.hexdigest()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid using the built-in name list as a parameter name, as it shadows the built-in list type. It is better to use a name like items or lst to prevent potential bugs and maintain readability.

Suggested change
def _bq_uuid_list(list):
checksum = hashlib.sha256()
for item in list:
checksum.update(item.encode('utf-8'))
# separator
checksum.update(b'\x00')
return checksum.hexdigest()
def _bq_uuid_list(items):
checksum = hashlib.sha256()
for item in items:
checksum.update(item.encode('utf-8'))
# separator
checksum.update(b'\x00')
return checksum.hexdigest()
References
  1. Avoid shadowing built-in names like 'list' to prevent naming conflicts and maintain code clarity. (link)


class _ShardDestinations(beam.DoFn):
"""Adds a shard number to the key of the KV element.
Expand Down Expand Up @@ -589,9 +596,7 @@ def process(
write_disposition = 'WRITE_APPEND'
wait_for_job = False

chunk_job_name = copy_job_name_base
if len(chunks) > 1:
chunk_job_name = f"{copy_job_name_base}_{i}"
chunk_job_name = '%s_%s' % (copy_job_name_base, _bq_uuid_list(chunk))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use f-strings instead of % formatting for consistency with the rest of the codebase and modern Python standards (PEP 498).

Suggested change
chunk_job_name = '%s_%s' % (copy_job_name_base, _bq_uuid_list(chunk))
chunk_job_name = f"{copy_job_name_base}_{_bq_uuid_list(chunk)}"
References
  1. Use f-strings (PEP 498) for string formatting to maintain consistency and readability. (link)

@Abacn Abacn Jun 25, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to be careful here. I remember bq load job name being deterministic is intended to avoid duplicate data on retry.

@Abacn Abacn Jun 25, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a commit causing these test failures (#38983 ?), consider revert it for the current release then fix forward, since release already cut yesterday and there is a risk potential problems not revealed if fix-forward with cherry-picks


_LOGGER.info(
"Triggering copy job %s from %s to %s (write_disposition: %s)",
Expand Down
Loading