ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Prometheus & Grafana - 2) 설치하기
    Kubernetes 2025. 3. 23. 19:38

    Helm Chart 다운로드

    kube-prometheus-stack는 Kubernetes 클러스터 모니터링을 위한 종합적인 솔루션을 제공합니다. 이 스택은 Kubernetes 매니페스트, 대시보드, 알림을 포함하는 Kubernetes 클러스터 모니터링을 쉽게 구성하고 운영할 수 있도록 돕는 문서와 스크립트를 제공합니다. 주요 요소는 Prometheus와 Grafana를 포함하며, 이를 통해 Kubernetes 환경에서의 메트릭 수집, 시각화, 알림을 자동화하고 관리할 수 있습니다.

     

    • 저장소 추가
    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update

     

     

    • 차트 다운로드
    helm pull prometheus-community/kube-prometheus-stack

     

     

     

     

    Values 구성

    • Ingress 구성
    • 볼륨 설정
      • 기본적으로 Prometheus 스택은 Pods가 데이터를 저장할 때 emptyDir 볼륨을 사용하도록 구성되어 있습니다. 즉, Pods가 재배포되거나 재시작될 때 데이터가 지속되지 않습니다. 상태를 유지하려면 Prometheus와 Grafana에 대해 영구 저장소를 활성화할 수 있습니다. 이는 아래와 같은 YAML을 사용하여 prometheus-stack을 업데이트함으로써 가능합니다.
    • Etcd 구성
      • 현재 etcd가 Pod가 아닌 Kubernetes 클러스터 외부에서 systemd로 실행 중이므로 해당 etcd에 대한 정보 구성이 필요합니다.
      • 인증서 키 secret 은 별도로 생성합니다. (동일 namespace)
        • secret은 Prometheus Pod의 /etc/prometheus/secrets/ 경로에 마운트됩니다.
    • Grafana 패스워드 지정
    alertmanager:
      ingress:
        enabled: true
        annotations: {}
        ingressClassName: "nginx"
        hosts:
          - "alertmanager.hyun.com"
        paths:
          - /
        tls: []
        # - secretName: alertmanager-general-tls
        #   hosts:
        #   - alertmanager.example.com
      service:
        type: ClusterIP
    
    grafana:
      adminUser: admin
      adminPassword: Curvc1004!
      ingress:
        enabled: true
        annotations: {}
        ingressClassName: "nginx"
        hosts:
          - "grafana.hyun.com"
        paths:
          - /
        tls: []
        # - secretName: grafana-general-tls
        #   hosts:
        #   - grafana.example.com
      service:
        type: ClusterIP
      persistence:
        enabled: true
        size: 10Gi
      sidecar:
        dashboards:
          enabled: true
          label: grafana_dashboard
          labelValue: 1
    
    prometheus:
      ingress:
        enabled: true
        annotations: {}
        ingressClassName: "nginx"
        hosts:
          - "prometheus.hyun.com"
        paths:
          - /
        tls: []
        # - secretName: prometheus-general-tls
        #   hosts:
        #   - prometheus.example.com
      service:
        type: ClusterIP
      prometheusSpec:
        secrets:
          - "etcd-client-cert"
        storageSpec:
          volumeClaimTemplate:
            spec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 10Gi
    
    kubeEtcd:
      endpoints:
      - 10.0.20.1
      service:
        port: 2379
        targetPort: 2379
      serviceMonitor:
        scheme: https
        insecureSkipVerify: false
        serverName: localhost
        caFile: /etc/prometheus/secrets/etcd-client-cert/ca.pem
        certFile: /etc/prometheus/secrets/etcd-client-cert/node-hyun-master01.pem
        keyFile: /etc/prometheus/secrets/etcd-client-cert/node-hyun-master01-key.pem
    
    kubelet:
      serviceMonitor:
        cAdvisor: false

     

     

    • Etcd secret 생성
    kubectl create secret generic etcd-client-cert \
      --from-file=ca.pem=/etc/ssl/etcd/ssl/ca.pem \
      --from-file=node-teamy-master01.pem=/etc/ssl/etcd/ssl/node-hyun-master01.pem \
      --from-file=node-teamy-master01-key.pem=/etc/ssl/etcd/ssl/node-hyun-master01-key.pem \
      -n monitoring

     

     

     

     

    독립형 cAdvisor 배포 (선택)

    Kubernetes 1.24 이상에서 Docker 지원 중단으로 인해 cAdvisor가 Docker 메타데이터 수집을 하지 않아 일부 메트릭 확인이 불가해졌습니다. (cpu, mem, pv 등)(Kubernetes / Compute Resources / 대시보드)
    저는 컨테이너엔진을 Docker로 사용하고 있기 때문에 독립형 cAdvisor를 배포하고 ServiceMonitor로 수집하도록 구성했습니다.

    참고 yaml : https://github.com/rancher/rancher/issues/38934#issuecomment-1294585708

    수정 사항

    Kubernetes 1.25 버전부터 PodSecurityPolicy가 더 이상 사용되지 때문에 PodSecurityPolicy 관련 리소스 제거 후 사용했습니다.

    • PodSecurityPolicy 생성 부분 제거 → PodSecurityContext로 변경
    • ClusterRole 수정
      • apiGroups 지정 제거
      • resourceName 제거
      • resources 변경
    • cAdvisor 버전 수정
    • Prometheus에서 serviceMonitor 인식하도록 라벨 추가
      • release=monitoring
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      labels:
        app: cadvisor
      name: cadvisor
      namespace: kube-system
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      labels:
        app: cadvisor
      name: cadvisor
    rules:
      - apiGroups: [""]
        resources:
        - nodes/metrics
        verbs:
        - use
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      labels:
        app: cadvisor
      name: cadvisor
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: cadvisor
    subjects:
    - kind: ServiceAccount
      name: cadvisor
      namespace: kube-system
    ---
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      annotations:
        seccomp.security.alpha.kubernetes.io/pod: docker/default
      labels:
        app: cadvisor
      name: cadvisor
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          app: cadvisor
          name: cadvisor
      template:
        metadata:
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ""
          labels:
            app: cadvisor
            name: cadvisor
        spec:
          automountServiceAccountToken: false
          securityContext:
            runAsUser: 0
            runAsGroup: 0
            fsGroup: 0
          containers:
          - args:
            - --housekeeping_interval=10s
            - --max_housekeeping_interval=15s
            - --event_storage_event_limit=default=0
            - --event_storage_age_limit=default=0
            - --enable_metrics=app,cpu,disk,diskIO,memory,network,process
            - --docker_only
            - --store_container_labels=false
            - --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace
            image: gcr.io/cadvisor/cadvisor:v0.45.0
            name: cadvisor
            ports:
            - containerPort: 8080
              name: http
              protocol: TCP
            resources:
              limits:
                cpu: 800m
                memory: 2000Mi
              requests:
                cpu: 400m
                memory: 400Mi
            volumeMounts:
            - mountPath: /rootfs
              name: rootfs
              readOnly: true
            - mountPath: /var/run
              name: var-run
              readOnly: true
            - mountPath: /sys
              name: sys
              readOnly: true
            - mountPath: /var/lib/docker
              name: docker
              readOnly: true
            - mountPath: /dev/disk
              name: disk
              readOnly: true
          priorityClassName: system-node-critical
          serviceAccountName: cadvisor
          terminationGracePeriodSeconds: 30
          tolerations:
          - key: node-role.kubernetes.io/controlplane
            value: "true"
            effect: NoSchedule
          - key: node-role.kubernetes.io/etcd
            value: "true"
            effect: NoExecute
          volumes:
          - hostPath:
              path: /
            name: rootfs
          - hostPath:
              path: /var/run
            name: var-run
          - hostPath:
              path: /sys
            name: sys
          - hostPath:
              path: /var/lib/docker
            name: docker
          - hostPath:
              path: /dev/disk
            name: disk
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: cadvisor
      labels:
        app: cadvisor
      namespace: kube-system
    spec:
      selector:
        app: cadvisor
      ports:
      - name: cadvisor
        port: 8080
        protocol: TCP
        targetPort: 8080
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        app: cadvisor
        release: monitoring
      name: cadvisor
      namespace: kube-system
    spec:
      endpoints:
      - metricRelabelings:
        - sourceLabels:
          - container_label_io_kubernetes_pod_name
          targetLabel: pod
        - sourceLabels:
          - container_label_io_kubernetes_container_name
          targetLabel: container
        - sourceLabels:
          - container_label_io_kubernetes_pod_namespace
          targetLabel: namespace
        - action: labeldrop
          regex: container_label_io_kubernetes_pod_name
        - action: labeldrop
          regex: container_label_io_kubernetes_container_name
        - action: labeldrop
          regex: container_label_io_kubernetes_pod_namespace
        port: cadvisor
        relabelings:
        - sourceLabels:
          - __meta_kubernetes_pod_node_name
          targetLabel: node
        - sourceLabels:
          - __metrics_path__
          targetLabel: metrics_path
          replacement: /metrics/cadvisor
        - sourceLabels:
          - job
          targetLabel: job
          replacement: kubelet
      namespaceSelector:
        matchNames:
        - kube-system
      selector:
        matchLabels:
          app: cadvisor

     

     

     

     

    설치 및 확인

    • 설치
    helm install monitoring kube-prometheus-stack -f values.yaml -n monitoring

     

     

    • Grafana

     

     

    • Promehteus

     

     

    • Alertmanager

     

     

     

    추가 설정

    • prometheus에서 kube-proxy 접근 허용
      • kube-proxy configmap 수정
    data:
      config.conf: |-
        metricsBindAddress: 0.0.0.0:10249

     

     

     

     

    Reference

    'Kubernetes' 카테고리의 다른 글

    EFK Stack 구성하기 (with Atlassian)  (0) 2025.03.23
    ArgoCD를 이용한 GitOps 구성  (0) 2025.03.23
    Prometheus & Grafana - 1) 개요  (0) 2025.03.23
    CloudNativePG  (0) 2025.03.23
    ArgoCD 설치 (Helm)  (0) 2025.03.23
Designed by Tistory.