feat: implemented caching and etag support for Go#6985
Conversation
|
💬 Discussion in Slack: #pr-review-infisical-6985-feat-implemented-caching-and-etag-support-for-go Posted by Review Police — reviews, comments, new commits, and CI failures will stream into this channel. |
|
| Filename | Overview |
|---|---|
| backend-go/internal/services/secrets/secretcache/secretcache.go | New cache service for secrets with ETag support; ETag field key is missing PermissionRulesHash, which can cause stale 304 responses after role permission changes. |
| backend-go/internal/server/middlewares/etag.go | New ETag middleware; missing If-None-Match: * wildcard handling and comma-separated ETag list support per RFC 7232. |
| backend-go/internal/server/api/secrets/secret/handler.go | New secrets handler with clean dependency injection and ETag/cache integration. |
| backend-go/internal/server/api/secrets/secret/list_secrets.go | Core list-secrets logic with permission filtering and cache integration; PersonalOverridesPriority deduplication key can collide on hyphenated secret names. |
| backend-go/internal/libs/cache/hash.go | New SHA-256 URL-safe base64 hash utility; clean port of the TypeScript equivalent. |
| backend-go/internal/libs/jitter/jitter.go | Simple jitter utility for cache TTL spreading using math/rand/v2. |
| backend-go/internal/services/permission/permission.go | New GetPermissionFingerprint and PermissionRulesHash helpers added for cache keying. |
| backend-go/internal/keystore/keystore.go | HashGet, HashSet, and PgGetIntItem added to KeyStore interface and implementation. |
| backend/src/services/project-env/project-env-service.ts | Adds invalidateSecretCacheByProjectId calls on environment mutations to keep the Go cache consistent. |
Comments Outside Diff (1)
-
backend-go/internal/server/api/secrets/secret/list_secrets.go, line 209-228 (link)PersonalOverridesPrioritydeduplication key can collideThe map key is built as
sec.Secret.Key + "-" + sec.Secret.FolderID.String(). Because secret keys can contain hyphens, this could collide for certain key/folderID combinations. Usingsec.Secret.Key + "\x00" + sec.Secret.FolderID.String()(a separator never valid in a key), or a composite struct as the map key, eliminates the ambiguity.
Reviews (1): Last reviewed commit: "feat: reverted standalone change made an..." | Re-trigger Greptile
adilsitos
left a comment
There was a problem hiding this comment.
some comments. I think the most imported one would be how the cache is being calculated
| cleanup := func() { | ||
| s.KMS.Close() | ||
| } |
There was a problem hiding this comment.
I think this can become too big if we start adding new cleanup. Do you see any problem if this is a method for the Service struct where we can call a cleanup operation? This way, on the main.go we can call this from the service instance.
| client redis.UniversalClient | ||
| type keyStore struct { | ||
| redis redis.UniversalClient | ||
| db pg.DB |
There was a problem hiding this comment.
nit: Here we define db, but this is actually Postgres right? Maybe we could have it as postgres instead of db, since the interface supports any db, so if in the future we need another db I believe it is more straightforward
There was a problem hiding this comment.
Not sure about another database. I used pg.DB because of two reasons.
- This is something used everywhere in the app. So most of them will get it as pg itself.
- I feel it would be strange to call is
pg.Postgres
| err := k.db.Replica().QueryRow(ctx, ` | ||
| SELECT "integerValue" | ||
| FROM key_value_store | ||
| WHERE key = @key | ||
| AND ("expiresAt" IS NULL OR "expiresAt" > NOW()) |
There was a problem hiding this comment.
question: one pattern that I see a lot on our knex queries is that if we receive a transaction we hit the primary database instead of the replica.
I saw the usage of this and it seems taht we are always hitting the replica (pgGetIntItem on keystore.ts) but the increment (pgIncrementBy.ts can receive a transaction, which will bump the wrong one). In my opinion, this should always query the main database, so we know that a cache was invalidated.
There was a problem hiding this comment.
I believe that we are going to use this to save items in memory right? Not sure if it is necessary, but since we are using go we could have a go routine to check the expiry items, and invalidate them. This would be lightweight, and we could have a low frequency of when this can be done.
| @@ -49,13 +58,21 @@ func (m *MemoryKeyStore) GetItem(_ context.Context, key string) (string, error) | |||
| func (m *MemoryKeyStore) SetExpiry(_ context.Context, key string, expiry time.Duration) (bool, error) { | |||
There was a problem hiding this comment.
it feels confusing to me that the setexpiry is the only method that applies changes on both hashes and items. Maybe we could have it with a different name? Or have an expiry for each field, one for hashes the other for items.
| resp := NewListSecretsRawV3ResponseData(&ListSecretsRawV3Response{ | ||
| Secrets: []SecretRaw{}, | ||
| }) |
There was a problem hiding this comment.
This is also returning the empty body, shouldn't we return only the headers with the 304 status code?
There was a problem hiding this comment.
Regarding the cache problems. The cache being created by node is different thatn the one created by go, that is why I faced those problems. Not sure if this is intended, but maybe they should be the same? Just something we can keep in mind
There was a problem hiding this comment.
from claude: reason for the difference between the cachekey on nodejs and go:
The two never agree, so a client moving between the Go and Node backends
(or Go reading an ETag Node wrote) will never get a 304, and neither
will hit the other's cached payload.
1. requestParamsHash diverges — this breaks the ETag field and the cache
key
This is the most likely thing you're actually seeing. The ETag is stored
in Redis under field actorId:permissionFingerprint:requestParamsHash.
That hash is computed completely differently:
Go (list_secrets.go:102) passes a map[string]any to GenerateHash →
json.Marshal. Go's json.Marshal sorts map keys alphabetically. Node's
JSON.stringify uses insertion order. So even with identical keys/values
the JSON strings differ → different SHA → different field.
But the keys aren't even identical:
Go map: environment, path, recursive, includeImports,
expandSecretReferences, viewSecretValue, personalOverridesBehavior,
tagSlugs, metadataFilter
Node object: environment, path, recursive, includeImports,
expandSecretReferences, expandPersonalOverrides,
personalOverridesBehavior, secretImportReferencesBehavior,
viewSecretValue, throwOnMissingReadValuePermission, ...params
| func GenerateHash(data any) string { | ||
| jsonBytes, err := json.Marshal(data) | ||
| if err != nil { | ||
| return "" | ||
| } | ||
| return HashBytes(jsonBytes) | ||
| } |
There was a problem hiding this comment.
Maybe we should return the error here? Or maybe even panic this.
| if args.ActorType == auth.ActorTypeIdentity { | ||
| apActorCondition = `membership."actorIdentityId" = addlPriv."actorIdentityId"` |
There was a problem hiding this comment.
got this with the help of claude, but it seems to make sense. We had a similar problem with secret access, where we were not checking users inside projects though a group.
"summary": "GetPermissionFingerprint joins additional_privileges on
a column-to-column predicate (membership.actorUserId =
addlPriv.actorUserId) instead of the literal @actorID used by the fixed
getPermission query — so it misses additional-privilege changes for
actors whose only project access is via a group, leaving the ETag
fingerprint unchanged.",
"failure_scenario": "A user whose project access is solely through a
group is granted (or has revoked) an additional privilege. No secret
write occurs, so invalidateSecretCacheByProjectId is never called and
the ETag hash is not deleted. Group-derived membership rows have
actorUserId = NULL, so this join drops the privilege from the
fingerprint and it stays identical. On the next ListSecrets with
If-None-Match, the 304 fast-path (secretcache.go:86-96) — which keys
purely on actorID:fingerprint:paramsHash — matches the stored ETag and
returns 304 Not Modified. The client keeps a stale secret list
reflecting the old privilege for up to the 15m ETag TTL / UTC-day
boundary. The sibling getPermission query (line 598-602) was
deliberately changed to the literal form for exactly this case, and the
TS reference (permission-dal.ts:976) also uses the literal actorId — the
fingerprint port did not get the fix."
Context
This PR implements the following things in Go Sidecar
There cache keys will have a mismatch between computed for request as the serialization is different for both. This is good as it won't cause conflict.
Closes PLATFOR-386 and PLATFOR-473
Screenshots
Steps to verify the change
.envType
Checklist
type(scope): short description(scope is optional, e.g.,fix: prevent crash on syncorfix(api): handle null response).