events collection could be more efficient

We might be able to squeeze a lot more performance from our api calls in github events collection

Sonnet 4.6 has identified that we are not being efficient for large repos:

> events.py already has a good bulk path (BulkGithubEventCollection), but it falls back to ThoroughGithubEventCollection when a repo has >300 pages of events. In that fallback:
> 
> Lines 312–333: loops every issue in the DB → individual GET .../issues/{issue_number}/events
> Lines 375–396: same per PR
> The trigger condition (line 49–60, the 300-page cutoff) means your largest, most active repos hit the worst path. GraphQL timelineItems would solve this.


https://github.com/chaoss/CollectOSS/blob/96adf3a4d68725db21622673ee6613693c0f5ace/collectoss/tasks/github/events.py#L312-L321

This feeds the same `extract_issue_event_data` function as the Bulk collection path, so it seems unlikely that we are getting different/more thorough data from the call-by-call method (if we were, then why would we skipping it for repos with fewer events?)

(note: this issue is very similar to several others. be careful to make sure you are talking about the same issue in the comments)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

events collection could be more efficient #422

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

events collection could be more efficient #422

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions