It seems that when the KAS is enabled, you connect to it via a wss://
endpoint, based on what that documentation says:
The agent server for Kubernetes is installed and available on GitLab.comat wss://kas.gitlab.com. If you use self-managed GitLab, you mustinstall an agent server or specify an external installation.
If that is true that is always on a wss://, then we likely will need
to know what nginx modifications would be needed to support websockets
like this.
Actually, we do already have some websocket pieces setup in the nginx
configuration. I did those to make the web terminal work for k8s in our
earlier experimentation!
I'm unsure if they are sufficient, but I've enabled the kas option, so
maybe you could try it and see if it works?
I created a testgroup, and clicked on the Kubernetes menu entry, but I only see the option to use the deprecated cert-based kubernetes integration (0xacab.org runs gitlab 14.8.4):
I see that you created a KAS agent registration token, but it hasn't
connected yet. Perhaps this has to be sorted before you can connect a
cluster without a certificate?
Sorry for the delay, but I'm in the process of creating a VM -> k8s cluster -> gitlab agent, which turned out to take longer than expected - who would have thought that
I'll let you know once I made it to the top of that list
Now I finally installed the agent and configured it with the token I got from registering it in the gitlab UI (and the KAS server URL wss://0xacab.org, hope this is right (wss://kas.0xacab.org didn't resolve).
I see these errors in the agent pod:
{"level":"error","time":"2022-03-21T22:10:37.069Z","msg":"Error handling a connection","mod_name":"reverse_tunnel","error":"Connect(): rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing failed to WebSocket dial: expected handshake response status code 101 but got 400\""}{"level":"warn","time":"2022-03-21T22:11:41.625Z","msg":"GetConfiguration failed","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing failed to WebSocket dial: expected handshake response status code 101 but got 400\""}
I got another agent connected fine to gitlab.com with the same setup (only a different URL/token combination), so i'm pretty sure the issue is on 0xacab's site.
Happy to help however I can from my side.
Cool! How did you get it to connect? After I made the websocket changes, I tried a few things (eg. made a change to the agent yaml) but couldn't get a change. Maybe you had to start it on your side?
From what I was finding, gitlab puts things on https://kas.gitlab.com otherwise it is typically found at /-/kubernetes-agent (this is the default in the configuration). I think the demand is pretty limited right now, we have barely anyone even using the pages feature, but I suspect because people don't really know what is possible.
Speaking of that, I'm really curious to know what you can do with this, besides just... connecting an agent!
I'm using flux for gitOps-style deployments on my k8s cluster, and used these two resources to deploy the agent.
The only config helm values I needed to add were the kasAddress and token (which I got after registering the agent in gitlab).
---apiVersion: source.toolkit.fluxcd.io/v1beta1kind: GitRepositorymetadata: name: gitlab-k8s-agent namespace: flux-systemspec: # The interval at which to check the upstream for updates interval: 24h url: https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent ref: branch: master---apiVersion: helm.toolkit.fluxcd.io/v2beta1kind: HelmReleasemetadata: name: gitlab-k8s-agent-0xacab namespace: varacspec: releaseName: gitlab-k8s-agent-0xacab chart: spec: chart: ./build/deployment/gitlab-agent-chart sourceRef: kind: GitRepository name: gitlab-k8s-agent namespace: flux-system interval: 24h values: # https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/build/deployment/gitlab-agent-chart/values.yaml config: kasAddress: wss://0xacab.org/-/kubernetes-agent/ token: ...
And I want it because I'd like to enable review apps for a website project - You could still use review apps without a k8s cluster but that's what I'd like to do.
Until now I could get an use the kubernetes context in my testproject (which I have been struggeling with before), but I'm unable to connect to the cluster, where I get a 426 error code. See https://0xacab.org/varac-projects/testproject/-/jobs/261378:
++ kubectl config get-contexts$ kubectl config get-contextsCURRENT NAME CLUSTER AUTHINFO NAMESPACE varac-projects/kubernetes-agent-setup:varac-agent1 gitlab agent:4 ++ echo '$ kubectl config use-context varac-projects/kubernetes-agent-setup:varac-agent1'++ kubectl config use-context varac-projects/kubernetes-agent-setup:varac-agent1$ kubectl config use-context varac-projects/kubernetes-agent-setup:varac-agent1Switched to context "varac-projects/kubernetes-agent-setup:varac-agent1".++ echo '$ kubectl version'++ kubectl version$ kubectl versionClient Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.8", GitCommit:"7061dbbf75f9f82e8ab21f9be7e8ffcaae8e0d44", GitTreeState:"clean", BuildDate:"2022-03-16T14:10:06Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}Error from server: the server responded with the status code 426 but did not return more information
I tried with the same setup on gitlab.com where I could successfully list all pods.
@micah Is there anything in the (nginx|KAS) logs that might help ?
2022-03-24_13:30:38.43005 {"level":"debug","time":"2022-03-24T06:30:38.429-0700","msg":"Config: no updates","correlation_id":"01FYY14TXSRPM4X3D1CH95YSKV","grpc_service":"gitlab.agent.agent_configuration.rpc.AgentConfiguration","grpc_method":"GetConfiguration","agent_id":4,"project_id":"varac-projects/kubernetes-agent-setup","commit_id":"17665bec0494fd1c00094cdc314dfdfb153a438d"}
I just re-configured it from scratch yesterday after moving already :/ (Meaning removing the agent and re-registered it ). Hugh, i'm a bit clueless right now. I'll setup my own runner so I can connect to the job container and poke a bit around.
I copied the kubeconfig from that container to my laptop and am able to reproduce this locally. In case you need that kubeconfig file to try yourself let me know:
❯ export KUBECONFIG=/home/varac/.kube/kas-0xacab.yml❯ kc -v 5 version Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-17T03:51:43Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}I0324 15:55:33.719972 619850 helpers.go:219] server response object: [{ "metadata": {}, "status": "Failure", "message": "the server responded with the status code 426 but did not return more information", "details": { "causes": [ { "reason": "UnexpectedServerResponse", "message": "WebSocket protocol violation: Connection header \"close\" does not contain Upgrade" } ] }, "code": 426}]Error from server: the server responded with the status code 426 but did not return more information
This is the nginx map module which lets you create variables in Nginx’s
configuration file whose values are conditional — that is, they depend
on other variables’ values.
I don't fully understand this, but I believe it is mapping the
$http_upgrade (which in this case is 'websocket') to something else,
maybe by default do an upgrade, but if its the empty string, do a close.
I also have configured proxy_set_header Connection $connection_upgrade_gitlab_ssl; but I find elsewhere that people just
have this set to proxy_set_header Connection upgrade; so I'm going to
try to set that instead.
That might be good - you seem to be getting good responses from the people who are core on this specific setup! I mentioned the nginx config I was using in https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6685#note_884011944 and asked for clarification there, but I think maybe an actual issue might be better.
RFC6455 says, "If the response lacks a |Connection| header field or the |Connection| header field doesn't contain a token that is an ASCII case-insensitive match for the value "Upgrade", the client MUST Fail the WebSocket Connection."
So something over there is setting the Connection header to 'close' which causes this to happen. This seems to not be something caused by the nginx configuration at all but something the kas-agent is doing?
If I strace that process while doing the connection, I see:
So what do we know? We know that this worked the other day, and then it stopped working for some reason. We didn't change anything over here, so I'm wondering if there is something you did that is making it not work right.
It never ever worked completely. In the first phase I was having authorization issues which resulted in my testproject CI container not having any kubeconfig file, which are solved now.
I thought with this the kubectl cluster access was working, but it isn't. So there were nowhen a state where everything worked :/
One thing I can think of is you had tried to make the test project public, but it didn't solve the problem, but then you moved it to your area, and perhaps that being private is the problem?
Can you try and set this up again in the test project?
then you moved it to your area, and perhaps that being private is the problem?
Can you try and set this up again in the test project?
The new group https://0xacab.org/varac-projects is public, and so are the two contained project, so there's nothing to improve here :/
I also removed the agent and re-registered it again without success. So I'll create an upstream issue now, and thanks again for bearing with me !
yeah, i updated that gitlab.com ticket with this information. I don't
know why it is redirecting to the sign-in page, but it usually does that
when there is not a proper authentication.