-
Notifications
You must be signed in to change notification settings - Fork 5
Feat: Embedded HTTP API Server #7 #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This is general structure, don't approve till all functionality added |
fe0510e to
01c3d64
Compare
Signed-off-by: Denis Tu <dmarshaltu@Deniss-MacBook-Air.local> Signed-off-by: Denis Tu <dmarshaltu@gmail.com>
Signed-off-by: Denis Tu <dmarshaltu@Deniss-MacBook-Air.local> Signed-off-by: Denis Tu <dmarshaltu@gmail.com>
Signed-off-by: Denis Tu <dmarshaltu@Deniss-MacBook-Air.local> Signed-off-by: Denis Tu <dmarshaltu@gmail.com>
Signed-off-by: Denis Tu <dmarshaltu@gmail.com>
Signed-off-by: Denis Tu <dmarshaltu@gmail.com>
EItanya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super awesome, thanks so much for getting this started! This is gonna unlock so many awesome use-cases :)
I think there are a few conversations we need to have before merging, but otherwise super happy about this.
| // AgentId specifies the Kagent agent to call when this event occurs (legacy format) | ||
| // +kubebuilder:validation:Optional | ||
| AgentId string `json:"agentId,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this added back in, this format has been deprecated and has many issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a rebase gone wrong. @DmarshalTU - we switched to AgentRef instead of AgentId to be aligned with the rest of Kagent
| deduplicationManager interfaces.DeduplicationManager, | ||
| kagentClient interfaces.KagentClient, | ||
| statusManager interfaces.StatusManager, | ||
| sreServer interface{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an interface{} and not a concrete type?
| agentRef := types.NamespacedName{ | ||
| Name: match.Configuration.AgentRef.Name, | ||
| Namespace: agentRefNs, | ||
| // Handle both agentId (legacy) and agentRef (new) formats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of this as I mentioned above
| // Continue even if status recording fails | ||
| } | ||
|
|
||
| // Add alert to SRE server if available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, why do we have a callout to the server from here, rather than the server sharing the same datasource as this component?
| @@ -0,0 +1,1590 @@ | |||
| package sre | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this specifically called sre. I think we should think about this as a general httpserver for khook rather than anything specifically for your UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apiServer would also work
| // AlertSummary represents alert statistics | ||
| type AlertSummary struct { | ||
| Total int `json:"total"` | ||
| Firing int `json:"firing"` | ||
| Resolved int `json:"resolved"` | ||
| Acknowledged int `json:"acknowledged"` | ||
| BySeverity map[string]int `json:"bySeverity"` | ||
| ByEventType map[string]int `json:"byEventType"` | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rely on a db for this similar to kagent. We can use sqlite by default, but rely on the same Postgres instance of kagent if it's available.
|
|
||
| // convertAgentIDFormat converts agent ID from kagent/k8s-agent to kagent__NS__k8s_agent format | ||
| // This matches the Python identifier format used by kagent | ||
| func convertAgentIDFormat(agentRef string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have libraries for this in kagent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of general comments about this file:
- Alerting is really an entirely new feature set, I'd personally prefer that we merge a v1 of the httpserver before adding alerting.
- This file is very very large, can we split the handlers up by feature set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, would also like the api server and alerting as 2 separate features. And yes, splitting the handlers would make this easier to comprehend and change.
So basically I showed case to @antweiss my use case for khook in the ide, and unfortunately I lost the source code, so this is fast attempt to recreate it for me and share my thoughts and needs of that, so I think no need to merge at least at that state and we should discuss main guidance and code new one but keep this PR for reference. |
antweiss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DmarshalTU in general - this is great. 2 important features many folks I'm sure can benefit from!
To summarize:
- check out @EItanya comments regarding keeping in line with the rest of kagent - it's important we keep things uniform
- lets change all
srereferences to somehting more generic like apiserver or httpserver - let's limit this PR to only adding an API server for existing types and add alerting as a separate feature (which I'm very thankful for)
- Use agentRef rather then agentId
anything I missed?
I do realize these are massive changes. But then - this is a massive feature!
| // AgentId specifies the Kagent agent to call when this event occurs (legacy format) | ||
| // +kubebuilder:validation:Optional | ||
| AgentId string `json:"agentId,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a rebase gone wrong. @DmarshalTU - we switched to AgentRef instead of AgentId to be aligned with the rest of Kagent
| os.Exit(1) | ||
| } | ||
|
|
||
| // Start SRE-IDE server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an issue with nomenclature - I know you built this specifically for your use case but would like to make stuff generic so other people can potentially build on them. I think just calling it "API server" will work better.
| @@ -0,0 +1,1590 @@ | |||
| package sre | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apiServer would also work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, would also like the api server and alerting as 2 separate features. And yes, splitting the handlers would make this easier to comprehend and change.
| EventTimeoutDuration = 10 * time.Minute | ||
| // NotificationSuppressionDuration is the window to suppress re-sending after success | ||
| NotificationSuppressionDuration = 10 * time.Minute | ||
| NotificationSuppressionDuration = 30 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be discussed. While the original 10 minutes may be too long, 30 seconds is IMHO too short. Maybe we should at least make this configurable by values/configmap
| } | ||
|
|
||
| // determineSeverity determines alert severity based on event type | ||
| func (s *Server) determineSeverity(eventType string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should hard code severity here. My plan for khook is to support pluggable event sources and - so potentially many more eventTypes in the future. Also - severity definitions may change for different teams/environments. I'd sugges to add an optional "severity" field on eventConfigurations in hook - then we can set the default values to what's defined here and allow to change this according to user's needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10000%, in fact I’d love to potentially start planning how we’d extend the API/system for multiple sources.
Feat: Embedded HTTP API Server #7 requirements. Here's what was accomplished:
Changes:
Enhanced API Implementation (FR1-FR6):
FR3: Event Statistics and Aggregation
FR4: Real-time Event Streaming
FR5: Hook Configuration Management
FR6: Health and Diagnostics
🔧 Key Features Implemented:
How to Use: