326 字
1 分钟
Kubernetes 调度与资源管理
一、调度流程
1.1 调度阶段
sequenceDiagram
participant API as kube-apiserver
participant SCH as kube-scheduler
participant Node as kubelet
API->>SCH: Watch 新 Pod (Pending)
SCH->>SCH: 预选 (Filtering)
SCH->>SCH: 优选 (Scoring)
SCH->>API: Bind Pod to Node
API->>Node: 通知 Kubelet
Node->>Node: 拉取镜像、启动容器
Node->>API: 更新 Pod 状态 (Running)
1.2 调度算法
# 预选阶段 (Filtering) - 过滤不符合的节点# - PodFitsResources: 检查资源是否足够# - PodFitsHostPorts: 检查端口是否冲突# - HostName: 检查节点名称# - MatchNodeSelector: 检查节点选择器# - NoDiskConflict: 检查存储冲突
# 优选阶段 (Scoring) - 选择最佳节点# - LeastRequestedPriority: 优先选择资源少的节点# - BalancedResourceAllocation: 平衡 CPU 和内存# - ImageLocalityPriority: 优先选择镜像缓存的节点二、节点选择
2.1 nodeSelector
apiVersion: v1kind: Podmetadata: name: nginxspec: nodeSelector: disktype: ssd # 标签键值对 region: us-east-1# 给节点打标签kubectl label nodes node-1 disktype=ssdkubectl label nodes node-1 region=us-east-1
# 查看标签kubectl get nodes --show-labels
# 删除标签kubectl label nodes node-1 disktype-2.2 节点亲和性
apiVersion: v1kind: Podmetadata: name: nginxspec: affinity: nodeAffinity: # 必须满足的条件 requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - us-east-1a - us-east-1b
# 优先满足的条件 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: memory operator: Gt values: - "8"| operator | 说明 |
|---|---|
| In | 标签值在列表中 |
| NotIn | 标签值不在列表中 |
| Exists | 标签键存在 |
| DoesNotExist | 标签键不存在 |
| Gt | 大于(字符串比较) |
| Lt | 小于(字符串比较) |
三、Pod 亲和性与反亲和性
3.1 Pod 亲和性
# 优先调度到有 redis 的节点apiVersion: v1kind: Podmetadata: name: appspec: affinity: podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: redis topologyKey: kubernetes.io/hostname3.2 Pod 反亲和性(分散部署)
# 分散 Pod,避免单点故障apiVersion: v1kind: Podmetadata: name: nginxspec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app: nginx topologyKey: kubernetes.io/hostname3.3 拓扑分布约束
# Pod 拓扑分布约束apiVersion: v1kind: Podmetadata: name: nginxspec: topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: nginx - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: nginx| whenUnsatisfiable | 说明 |
|---|---|
| DoNotSchedule | 不满足则不调度 |
| ScheduleAnyway | 优先选择偏差小的 |
四、污点与容忍
4.1 污点配置
# 添加污点kubectl taint nodes node1 key=value:NoSchedulekubectl taint nodes node1 dedicated=gpu:NoExecutekubectl taint nodes node1 temporary=true:NoSchedule
# 查看污点kubectl describe node node1 | grep Taints
# 删除污点kubectl taint nodes node1 key=value:NoSchedule-kubectl taint nodes node1 dedicated-4.2 容忍配置
apiVersion: v1kind: Podmetadata: name: gpu-appspec: tolerations: # 匹配任意值的污点 - key: "dedicated" operator: "Exists" effect: "NoSchedule"
# 匹配特定值 - key: "dedicated" operator: "Equal" value: "gpu" effect: "NoSchedule"
# 匹配所有污点 - key: "node.kubernetes.io/not-ready" operator: "Exists" effect: "NoExecute" tolerationSeconds: 300
# 无视所有污点 - operator: "Exists"| 污点效果 | 说明 |
|---|---|
| NoSchedule | 不调度新 Pod 到该节点 |
| PreferNoSchedule | 尽量不调度 |
| NoExecute | 驱逐已有 Pod |
五、资源配额
5.1 ResourceQuota
apiVersion: v1kind: ResourceQuotametadata: name: compute-quota namespace: defaultspec: hard: requests.cpu: "10" requests.memory: 20Gi limits.cpu: "20" limits.memory: 40Gi pods: "50" services: "10" persistentvolumeclaims: "5"# 查看配额使用kubectl get resourcequota -o widekubectl describe resourcequota compute-quota5.2 LimitRange
apiVersion: v1kind: LimitRangemetadata: name: limits namespace: defaultspec: limits: - type: Container max: cpu: "2" memory: 1Gi min: cpu: "100m" memory: 64Mi default: cpu: "500m" memory: 256Mi defaultRequest: cpu: "200m" memory: 128Mi maxLimitRequestRatio: cpu: "10" memory: "4"5.3 资源请求与限制
apiVersion: v1kind: Podmetadata: name: nginxspec: containers: - name: nginx image: nginx:1.25 resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m"# 资源请求 (requests) - 调度依据# 资源限制 (limits) - 限制最大使用
# QoS 等级# 1. Guaranteed ( limits == requests )# 2. Burstable ( requests < limits )# 3. BestEffort ( 无 requests 和 limits )六、优先级与抢占
6.1 PriorityClass
apiVersion: scheduling.k8s.io/v1kind: PriorityClassmetadata: name: high-priorityvalue: 100000globalDefault: falsedescription: "高优先级 Pod"6.2 使用优先级
apiVersion: v1kind: Podmetadata: name: important-appspec: priorityClassName: high-priority containers: - name: app image: app:latest6.3 抢占机制
sequenceDiagram
participant P as 新高优先级 Pod
participant SCH as Scheduler
participant L as 低优先级 Pod
P->>SCH: 请求调度
SCH->>SCH: 无可用节点
SCH->>L: 抢占低优先级 Pod
Note over L: Pod 被驱逐
SCH->>P: 调度到该节点
七、调度配置
7.1 调度器配置
apiVersion: kubescheduler.config.k8s.io/v1beta3kind: KubeSchedulerConfigurationprofiles: - pluginConfig: - name: NodeResources args: mode: Least7.2 多个调度器
# 使用自定义调度器kubectl create configmap my-scheduler-config \ --from-file=kube-scheduler-config.yaml \ -n kube-system
# 创建使用自定义调度器的 Podkubectl create -f pod.yaml八、调度优化实践
8.1 资源预留
# 为系统组件预留资源# kube-apiserver: 至少 500m CPU, 256Mi 内存# kubelet: 至少 500m CPU, 500Mi 内存8.2 亲和性策略
| 策略 | 场景 |
|---|---|
| 节点亲和性 | 特定硬件需求(GPU、SSD) |
| Pod 亲和性 | 相关服务就近通信 |
| Pod 反亲和性 | 高可用部署分散 |
| 污点容忍 | 专用节点运行特殊工作负载 |
8.3 调度决策因素
# 调度器考虑的因# 1. 资源需求(CPU、内存、GPU)# 2. 亲和性/反亲和性规则# 3. 污点与容忍# 4. 优先级# 5. 拓扑分布约束# 6. 污点容忽时间支持与分享
如果这篇文章对你有帮助,欢迎支持作者或分享给更多人
部分信息可能已经过时
相关文章 智能推荐
1
Kubernetes 网络与存储
面试 Kubernetes 网络模型、Service 类型、Ingress、NetworkPolicy、存储卷与持久化存储。
2
Kubernetes 核心架构与组件
面试 Kubernetes 核心架构——Control Plane 组件、Node 组件、Pod 生命周期、Deployment 管理。
3
容器化面试题
面试 面试中常见的容器化技术题目——Docker 命名空间、cgroup 隔离、Kubernetes 调度机制等知识点整理。
4
Kubernetes 安全与 RBAC
面试 Kubernetes 安全——RBAC 权限模型、Security Context、Pod Security Standards、网络策略、Secret 管理。
5
Python 内存管理与性能优化
面试 Python 内存管理——引用计数、垃圾回收机制、分代回收、内存泄漏、__slots__、性能优化技巧。






