官方文档:apache/skywalking-kubernetes: Apache SkyWalking Kubernetes Deployment Helm Chart (github.com)
1、认识skywalking组件
- Skywalking OAP Server: Skywalking OAP Server 是 Skywalking 分析系统的核心组件之一。
- 主要负责接收和处理来自 Skywalking Agent 的数据,并将数据存储到后端存储中(如 Elasticsearch 或者 MySQL)。
- OAP Server 还会对数据进行预处理、聚合和压缩,以提高数据处理和存储的效率。
- Skywalking UI: Skywalking UI 是 Skywalking 分析系统的 Web UI。
- 用于展示应用程序的性能指标、调用链信息、拓扑图等分析结果。
- 用户可以通过 Skywalking UI 快速定位应用程序中的性能问题和瓶颈。
- Skywalking ES Init: Skywalking ES Init 是用于初始化 Elasticsearch 的工具。
- 可以快速创建 Elasticsearch 索引和映射,以便存储和查询 Skywalking 分析系统的数据。
- Elasticsearch: Elasticsearch 是一个开源的分布式搜索和分析引擎,用于存储 Skywalking 分析系统的数据。
- Skywalking OAP Server 会将数据存储到 Elasticsearch 中,并通过 Elasticsearch 进行数据查询和分析。
2、配置部署
1、控制器方式部署
部署ES(集群版)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: es
namespace: devops
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
imagePullSecrets:
- name: harborsecret
initContainers:
- name: increase-vm-max-map
image: busybox:latest
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
- name: increase-fd-ulimit
image: busybox:latest
command: ["sh", "-c", "ulimit -n 65536"]
securityContext:
privileged: true
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.6.2
ports:
- name: rest
containerPort: 9200
- name: inter
containerPort: 9300
resources:
limits:
cpu: 1000m
requests:
cpu: 1000m
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
env:
- name: cluster.name
value: k8s-logs
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: cluster.initial_master_nodes
value: "es-0,es-1,es-2"
- name: discovery.zen.minimum_master_nodes
value: "2"
- name: discovery.seed_hosts
value: "elasticsearch"
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx2048m"
- name: network.host
value: "0.0.0.0"
volumeClaimTemplates:
- metadata:
name: data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: nfs-storage
resources:
requests:
storage: 50Gi
---
kind: Service
apiVersion: v1
metadata:
name: elasticsearch
namespace: devops
labels:
app: elasticsearch
spec:
selector:
app: elasticsearch
clusterIP: None
ports:
- port: 9200
name: rest
- port: 9300
name: inter-node
---
kind: Service
apiVersion: v1
metadata:
name: elasticsearch-client
namespace: devops
labels:
app: elasticsearch
spec:
selector:
app: elasticsearch
ports:
- port: 9200
name: rest
- port: 9300
name: inter-node
部署ES(单机版)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
namespace: devops
spec:
replicas: 1
selector:
matchLabels:
app: elasticsearch
serviceName: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- env:
- name: ES_JAVA_OPTS
value: -Xms2048m -Xmx2048m
- name: node.data
value: "true"
- name: node.master
value: "true"
- name: path.data
value: /usr/share/elasticsearch/data
# 自定义集群名
- name: cluster.name
value: es-cluster
# 定义节点名,使用metadata.name名称
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
# 初始化集群时,ES从中选出master节点
- name: cluster.initial_master_nodes
# 对应metadata.name名称加编号,编号从0开始
value: "elasticsearch-0"
- name: discovery.zen.minimum_master_nodes
value: "1"
# 发现节点的地址,discovery.seed_hosts的值应包括所有master候选节点
# 如果discovery.seed_hosts的值是一个域名,且该域名解析到多个IP地址,那么es将处理其所有解析的IP地址。
- name: discovery.seed_hosts
value: "elasticsearch"
name: elasticsearch
image: elasticsearch:7.17.4
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- |
sysctl -w vm.max_map_count=262144
ulimit -l unlimited
ulimit -n 65536
chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/data
ports:
- containerPort: 9200
name: 9200tcp2
protocol: TCP
- containerPort: 9300
name: 9300tcp2
protocol: TCP
resources:
limits:
cpu: "2"
memory: 4Gi
requests:
cpu: "1"
memory: 2Gi
# 设置挂载目录
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
# 对应容器中volumeMounts.name
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: nfs-storage
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch
namespace: devops
spec:
clusterIP: None
ports:
- name: elasticsearch-in
port: 9300
protocol: TCP
targetPort: 9300
- name: elasticsearch-out
port: 9200
protocol: TCP
targetPort: 9200
selector:
app: elasticsearch
type: ClusterIP
部署RBAC
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: skywalking
name: skywalking-oap
namespace: devops
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: skywalking
namespace: devops
labels:
app: skywalking
rules:
- apiGroups: [""]
resources: ["pods", "endpoints", "services", "nodes"]
verbs: ["get", "watch", "list"]
- apiGroups: ["extensions"]
resources: ["deployments", "replicasets"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: skywalking
namespace: devops
labels:
app: skywalking
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: skywalking
subjects:
- kind: ServiceAccount
name: skywalking-oap
namespace: devops
数据初始化Job
apiVersion: batch/v1
kind: Job
metadata:
name: "skywalking-es-init"
namespace: devops
labels:
app: skywalking-job
spec:
template:
metadata:
name: "skywalking-es-init"
labels:
app: skywalking-job
spec:
serviceAccountName: skywalking-oap
restartPolicy: Never
initContainers:
- name: wait-for-elasticsearch
image: busybox:1.30
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'for i in $(seq 1 60); do nc -z -w3 elasticsearch 9200 && exit 0 || sleep 5; done; exit 1']
containers:
- name: oap
image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:8.9.0
imagePullPolicy: IfNotPresent
env:
- name: JAVA_OPTS
value: "-Xmx2g -Xms2g -Dmode=init"
- name: SW_STORAGE
value: elasticsearch
- name: SW_STORAGE_ES_CLUSTER_NODES
value: "elasticsearch:9200"
volumeMounts:
volumes:
部署OAP
---
apiVersion: v1
kind: Service
metadata:
name: oap-svc
namespace: devops
labels:
app: oap
spec:
type: ClusterIP
ports:
- port: 11800
name: grpc
- port: 12800
name: rest
selector:
app: oap
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: oap
name: oap
namespace: devops
spec:
replicas: 1
selector:
matchLabels:
app: oap
template:
metadata:
labels:
app: oap
spec:
serviceAccountName: skywalking-oap
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: "skywalking"
initContainers:
- name: wait-for-elasticsearch
image: busybox:1.30
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'for i in $(seq 1 60); do nc -z -w3 elasticsearch.devops.svc 9200 && exit 0 || sleep 5; done; exit 1']
containers:
- name: oap
image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:8.9.0
imagePullPolicy: IfNotPresent
livenessProbe:
tcpSocket:
port: 12800
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
tcpSocket:
port: 12800
initialDelaySeconds: 15
periodSeconds: 20
ports:
- containerPort: 11800
name: grpc
- containerPort: 12800
name: rest
env:
- name: JAVA_OPTS
value: "-Dmode=no-init -Xmx2g -Xms2g"
- name: SW_CLUSTER
value: kubernetes
- name: SW_CLUSTER_K8S_NAMESPACE
value: "default"
- name: SW_CLUSTER_K8S_LABEL
value: "app=skywalking,release=skywalking,component=oap"
# 记录数据
- name: SW_CORE_RECORD_DATA_TTL
value: "2"
# Metrics数据
- name: SW_CORE_METRICS_DATA_TTL
value: "2"
- name: SKYWALKING_COLLECTOR_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
- name: SW_STORAGE
value: elasticsearch
- name: SW_STORAGE_ES_CLUSTER_NODES
value: "elasticsearch.devops.svc:9200"
部署UI
---
apiVersion: v1
kind: Service
metadata:
labels:
app: ui
name: ui-svc
namespace: devops
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 8080
protocol: TCP
selector:
app: ui
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ui
namespace: devops
labels:
app: ui
spec:
replicas: 1
selector:
matchLabels:
app: ui
template:
metadata:
labels:
app: ui
spec:
affinity:
containers:
- name: ui
image: skywalking.docker.scarf.sh/apache/skywalking-ui:8.9.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: page
env:
- name: SW_OAP_ADDRESS
value: http://oap-svc:12800 #根据oap的svc一致
部署Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
namespace: devops
name: skywalking-ingress
spec:
ingressClassName: nginx
rules:
- host: skywalking.kubernets.cn
http:
paths:
- pathType: Prefix
backend:
service:
name: ui-svc
port:
number: 8080
path: /
2、Helm方式部署
设置环境变量
# 配置安装helm软件的名称
export SKYWALKING_RELEASE_NAME=skywalking
# 配置skywalking安装到k8s的命名空间
export SKYWALKING_RELEASE_NAMESPACE=devops
# 配置helm仓库名称
export REPO=skywalking
helm添加仓库
$ helm repo add ${REPO} https://apache.jfrog.io/artifactory/skywalking-helm
把skywalking安装包拉取下来
$ helm pull ${REPO}/skywalking --untar
修改values.yaml
elasticsearch:
...
config:
host: elasticsearch.devops.svc
password:
port:
http: 9200
user:
enabled: false # 是否启用内部es
persistence:
annotations: {}
enabled: true # 是否启用数据持久化
...
oap:
antiAffinity: soft
dynamicConfigEnabled: false
env: null
envoy:
als:
enabled: false
image:
pullPolicy: IfNotPresent
repository: skywalking.docker.scarf.sh/apache/skywalking-oap-server
tag: 8.9.0
storageType: elasticsearch
....
ui:
image:
pullPolicy: IfNotPresent
repository: skywalking.docker.scarf.sh/apache/skywalking-ui
tag: 8.9.0
部署&&升级
## 安装部署
$ helm install skywalking skywalking -n devops --values ./skywalking/values.yaml
## 更新
$ helm upgrade skywalking skywalking -n devops --values ./skywalking/values.yaml
## 卸载
$ helm uninstall skywalking -ndevops
创建基于helm方式的ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
namespace: devops
name: skywalking-ingress
spec:
ingressClassName: nginx
rules:
- host: skywalking.kubernets.cn
http:
paths:
- pathType: Prefix
backend:
service:
name: skywalking-ui
port:
number: 8080
path: /
3、用于skywalking ui访问控制的secret
skywalking
的 ui 界面默认没有访问控制,可以通过下面基于Nginx Ingress
的basic auth
方案,实现自定义服务的外部验证
画重点:这里使用basic
有个小坑,参考官方文档经过测试,在创建secret
之前通过htpasswd
工具生成的记录用户名密码的文件的文件名,必须叫auth
,不然经过后续的一顿操作,最终访问的结果还是503
,这与传统方式配置nginx
的basic auth
是不同的,可能在源码中将此参数硬编码了
## 生成auth
$ htpasswd -c auth admin
New password:
Re-type new password:
Adding password for user admin
## 基于auth创建secret
$ kubectl -n devops create secret generic ui-auth --from-file=auth
secret/ui-auth created
## 重新配置ingress路由
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
namespace: devops
name: skywalking-ingress
annotations:
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: ui-auth
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required - admin'
spec:
ingressClassName: nginx
rules:
- host: skywalking.kubernets.cn
http:
paths:
- pathType: Prefix
backend:
service:
name: skywalking-ui
port:
number: 8080
path: /
4、总结
基于 Kubernetes 的 SkyWalking 部署方式总结:
- 1、弹性伸缩: Kubernetes 是一个流行的容器编排平台,它提供了自动伸缩的能力。当 负载增加时,可以通过调整副本数量来自动扩展 SkyWalking 服务实例,以适应更 大的监控需求。
- 2、故障恢复: Kubernetes 提供了高可用性和故障恢复机制。如果某个 SkyWalking 服 务实例出现故障或崩溃,Kubernetes 能够自动重启该实例或将其替换为新的实例, 从而确保监控服务的连续性。
- 3、资源管理: Kubernetes 具有强大的资源管理功能。通过配置资源限制和请求,可以 为 SkyWalking 分配适当的计算资源(如 CPU 和内存),以满足监控服务的性能需 求。同时,Kubernetes 还支持资源配额和优先级设置,以确保 SkyWalking 应用程 序在共享的集群环境中获得公平的资源分配。
- 4、安装和部署简化: Kubernetes 提供了简洁的部署和管理方式。使用 Kubernetes 部 署 SkyWalking,可以通过声明性的配置文件及比较成熟的HELM方式部署落地,并 使用 Kubernetes 资源对象(如 Deployment、Service、Ingress 等)来管理应用 的生命周期。提高了可维护性和可重复性。
- 5、与微服务集成: Kubernetes 与微服务架构天生契合。SkyWalking 作为一个分布式 追踪和监控系统,可以无缝地与 Kubernetes 中的微服务应用程序集成。它能够自 动发现和跟踪各个微服务之间的调用关系,提供全面的分布式追踪和性能监控。
总的来说,基于 Kubernetes 的 SkyWalking 部署方式充分利用了 Kubernetes 的弹性伸缩、故障恢复、资源管理和简化部署等优势。
这种部署方式使 SkyWalking 能够更好地适应动态的监控需求,并与微服务架构紧密集成,提供高效、可靠的分布式追踪和监控能力。