Refactor do_connect for better logging by jamesmyatt · Pull Request #504 · fsspec/adlfs

jamesmyatt · 2025-07-17T14:17:50Z

Also consistent ordering of credentials throughout.

And consistent ordering of credentials

martindurant · 2025-07-17T14:20:39Z

There are now a few PRs waiting in this repo - please do ping me when you think any of them is ready to be merged.

jamesmyatt · 2025-07-19T16:12:43Z

I think the test failures are due to Azurite: I've started a discussion here #505

kyleknap · 2025-07-21T23:49:10Z

Hi @jamesmyatt. Thanks for the PR! Could you elaborate more on the primary motivation for the PR? It to just be able to better trace how credentials are being resolved when using adlfs? Or is it more to make consolidate some of the client creation logic to make it more manageable for subsequent adlfs feature work?

The main reason that I ask is there is another in-flight PR: #501 where we are adding some additional client configuration and doing some refactoring to at least consolidate how client keyword arguments are set. And it seems there is some overlap in trying to refactor/consolidate these. So, I'm just trying to figure out what is the best approach to incorporating both of these. I do agree there is some opportunities to consolidate logic even past what is done in #501 that would be beneficial for future development.

jamesmyatt · 2025-07-22T11:00:18Z

Mostly it's to get better traceability/transparency over what credentials ADLFS is using to connect to storage, and to match the docstrings with the code. For example, I was surprised when it was using a connection string from os.environ when I wasn't expecting it to and I thought that better logging was the solution.

TBH, I prefer this refactoring to #501, especially since it also eliminates more duplication, and it should be easy enough to combine the updates. But if the logging is clear, then I'll be happy since I'm not a maintainer here. So any of the following actions are OK for me:

Continue with adlfs user agent #501, I'll update this once adlfs user agent #501 is merged, you can re-review and choose.
You merge this branch into adlfs user agent #501, close this one and continue with the combined adlfs user agent #501.
You add logging in adlfs user agent #501 and close this one

kyleknap · 2025-07-22T16:58:37Z

@jamesmyatt Thanks for the explanation! That makes sense to me.

I do like the direction of this PR's refactoring. I see the refactoring here as the next step in terms of building on what is being done in: #501. The other PR did not go to that length of refactoring in order to balance minimizing changes needed for the functionality while still cleaning up the logic.

In terms of the options, I prefer this one:

Continue with #501, I'll update this once #501 is merged, you can re-review and choose.

Specifically, I see the changes in #501 as a steppingstone to these changes. For example, one thing that we realized was that there were not really any tests that leveraged non-connection string setups so #501 has been setting up some of the scaffolding needed to be able to assert how clients are constructed, which we can leverage for cases in this PR to ensure the refactoring maintains behavior.

That being said, while progress toward merging in #501 is being made, I can take a first review of this PR to help minimize the number of update cycles needed to get this PR merged.

jamesmyatt · 2025-07-22T17:17:38Z

That sounds great. Thanks

kyleknap

Looks good. Just had some suggestions. We merged #501 and I recently created #507 to improve the test coverage for client creation. I'd recommend waiting for #507 to be merged and rebase/update off to help identify functional differences in the refactoring.

kyleknap · 2025-07-24T20:45:55Z

        self.max_concurrency = max_concurrency

+    @property
+    def account_url(self) -> str:


Technically, to keep the existing behavior, we will need to check whether the connection string is set and if so return None. We should add that since if a connection_string is provided, an account_name would not be and we could end up encoding None into the account url.

While it is possible to piece together an account url from a connection string, that would be quite a bit of logic for the purpose of this PR.

I'll update it to return None

kyleknap · 2025-07-24T20:51:22Z

        except Exception as e:
            raise ValueError(f"unable to connect to account for {e}") from e

+    def _get_service_client(self) -> AIOBlobServiceClient:


Could we hoist this helper internal method to the module level instead of making it a method on the class? Mainly that helps better avoid accessing internal methods of one class from another class.

Not really. It uses a lot of AzureBlobFileSystem attributes, so it needs the AzureBlobFileSystem object, and might as well just be a method. So I'd prefer to make it a public method (create_service_client), if it needs to be called from other classes.

The other option is to make a public method that prepares the arguments (e.g. get_client_kwargs), and then calls a module-level helper to actually create the AIOBlobServiceClient.

kyleknap · 2025-07-24T20:52:40Z

+                    if not self.sas_token.startswith("?"):
+                        self.sas_token = f"?{self.sas_token}"


It would be interesting if we could hoist this logic to the __init__. Mainly, going over this method now this is the last place where any mutation is happening on the file system. So if we move it, it give the nice property that this method does not mutate state.

I agree this shouldn't be mutating self, but I think this is actually just a hack for the way that the overall URL is constructed (i.e. query part). What is expected to be the value of sas_token passed to the constructor? The azure docs say that the "?" delimiter is not part of the SAS token: https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview?toc=/azure/storage/blobs/toc.json&bc=/azure/storage/blobs/breadcrumb/toc.json#sas-token.

The URL can be reconstructed with urllib.parse I think.

kyleknap · 2025-07-24T20:59:20Z

+                        self.sas_token = f"?{self.sas_token}"
+                    kwargs["account_url"] = f'{kwargs["account_url"]}{self.sas_token}'
+                else:
+                    logger.info("Connect using anonymous login")


In the previous logic, the location_mode was never used in anonymous mode, meaning if a value of secondary was set adlfs would still use the primary endpoint. I'd lean toward retaining that behavior for now since the PR is more about refactoring and logging. But I'm open to respecting location_mode for anonymous connections if it supported for GRS accounts but would need to double check to confirm if it is.

kyleknap · 2025-07-24T21:06:24Z

+            creds = ["credential", "sync_credential", "account_key"]
+            for name in creds:
+                if (cred := getattr(self, name, None)) is not None:
+                    logger.info("Connect using %s", name.replace("_", " "))


For these logging statements, let's set the level to debug. Mainly debug is the prevalent level used in the codebase and these statements are intended for debugging which credentials are used.

kyleknap · 2025-07-24T21:41:52Z

+                "credential": None,
+                "_location_mode": self.location_mode,
+            }
+            creds = ["credential", "sync_credential", "account_key"]


Any ideas on why the file system's client and the file-object's client were out of sync in terms of credential resolution here in the first place? I need to go through the git history on how these became out of sync, but I'd probably lean toward even removing sync_credential from that list so that:

We have complete parity with the do_connect() method. Typically this is the method that most people will use to authenticate since by default even in the AzureBlobFile class, it will just reuse the client from the file system unless connect_client() is explicitly called.

While technically it is possible to provide a sync credential to an async client, it does not seem like a practice that we should be continuing, instead async clients should be using async credentials.

I'll look more into this and circle back on what we do here.

Looks like this is the only place that sync_credential is used, so I'd be tempted to remove it entirely. What do you think?

I also think that this method should use exactly the same logic as do_connect. Otherwise it will only cause confusion. To me, it looks like a copy-and-paste that has not been properly updated.

Actually, I'm fairly sure sync_credentials can be removed: #551

There's also the question of why connect_client is creating a new service client, when the associated fs object should already have one.

jamesmyatt · 2026-06-15T18:49:16Z

Sorry for the delay coming back to this.

I've opened #551 to simplify the credentials, there are a few comments from @kyleknap to resolve, and then I can update this PR. I still think that extracting a method that creates the service client along with the associated logging is worthwhile.

TBH, I suspect that the whole way that adlfs handles credentials could do with being reviewed alongside the adlfs docs and the Azure docs.

jamesmyatt added 2 commits July 17, 2025 15:10

Refactor do_connect for better logging

1b781a0

And consistent ordering of credentials

Fix docstrings

7f59456

kyleknap mentioned this pull request Jul 24, 2025

Introduce connect client tests #507

Merged

kyleknap reviewed Jul 24, 2025

View reviewed changes

jamesmyatt marked this pull request as draft June 15, 2026 09:58

jamesmyatt mentioned this pull request Jun 15, 2026

Remove sync credentials #551

Open

		if not self.sas_token.startswith("?"):
		self.sas_token = f"?{self.sas_token}"

Uh oh!

Conversation

jamesmyatt commented Jul 17, 2025

Uh oh!

martindurant commented Jul 17, 2025

Uh oh!

jamesmyatt commented Jul 19, 2025

Uh oh!

kyleknap commented Jul 21, 2025

Uh oh!

jamesmyatt commented Jul 22, 2025

Uh oh!

kyleknap commented Jul 22, 2025

Uh oh!

jamesmyatt commented Jul 22, 2025

Uh oh!

kyleknap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesmyatt Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesmyatt Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesmyatt Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesmyatt commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jamesmyatt Jun 15, 2026 •

edited

Loading

jamesmyatt Jun 15, 2026 •

edited

Loading

jamesmyatt Jun 15, 2026 •

edited

Loading